RE: CAN for Path & Service instance selection: RE: [Dyncast] CAN BoF issues #7 #17 #32

Dirk Trossen Fri, 17 Jun 2022 08:24:55 -0700

Linda,

I'm not arguing the compute awareness part. But your algorithm still just 
selects one instance over another. The compute unit based selection I mentioned 
before (presented at IFIP networking) does the same, based on relative compute 
capabilities.


But both, i. E. also your example, fall under mode 1 I outlined. But my point 
is that you are not selecting paths but endpoints (while the underlay has done 
the path selection to each of the possible endpoints already).

Of course, you can use available mechanisms to further signal routing hints to 
send packets to whatever instance your mechanism determined, as also Joel 
pointed out, but that's known stuff.

Best

Dirk


From:Linda Dunbar <[email protected]>
To:Dirk Trossen <[email protected]>;liupengyjy 
<[email protected]>;dyncast <[email protected]>
Cc:rtgwg <[email protected]>;David R. Oran <[email protected]>;jefftant.ietf 
<[email protected]>
Date:2022-06-17 16:59:43
Subject:CAN for Path & Service instance selection: RE: [Dyncast] CAN BoF issues 
#7 #17 #32


Dirk,

How about considering CAN as path selection not only based on the routing 
distance but also on the running environment of the destinations? Make it truly 
the Computing Aware networking.

Using your example: S_i and S_j are both instances for the Service S.

Here is one example algorithm to compute the cost to reach S-i relative to S-j. 
When S-i is  plugged in the formula, the cost is 1. So, if the formula returns 
a value less than 1, the cost to reach S-j is less than reaching S-i.

                CP-j * Load-i              Pref-j * Network-Delay-i
 Cost-i= (w *(----------------) + (1-w) *(-------------------------))
     CP-i * Load-j               Pref-i * Network-Delay-j

Load-i: Load Index at S-i, which can be queried from another source like 
DYNCAST’s DM-A,

CP-i: capacity index at S-i, a higher value means higher capacity.

Delay-i: Network latency measurement (RTT) to the egress router to S-i.

Pref-i: Preference index for the S-i, a higher value means higher preference.

w: Weight for Computing or networking contributing to the path selection, which 
is a value between 0 and 1. If smaller than 0.5, Network latency and the site 
Preference have more influence; otherwise, Computing Metrics have more 
influence.


Linda

From: Dirk Trossen <[email protected]>
Sent: Friday, June 17, 2022 2:02 AM
To: Linda Dunbar <[email protected]>; [email protected]; 
dyncast <[email protected]>
Cc: rtgwg <[email protected]>; David R. Oran <[email protected]>; jefftant.ietf 
<[email protected]>
Subject: RE: [Dyncast] CAN BoF issues #7 #17 #32

Hi Linda, Peng, all,

Let us tease apart what “include the path selection” may mean since the nature 
of this inclusion may be significant in difference.

For this, let us assume a service instance S_1 as one of possibly several ones 
for service S. S_1 may be reachable over a number of network paths, the 
selection of some of which would significantly impact any compute-aware 
selection of S_1 over the other available service instances for S. I can see 
two modes of ‘including path selection”:


  1.  S_1 exposes two (or more) IP addresses, where each IP address reflects a 
path from the client to the exposed address. IP addresses may be exposed across 
more than one network operator, multi-homing the service instance. Now here, 
‘path selection’ is indirectly done by picking one IP address over all others, 
including the IP addresses of other service instances, and indeed, such 
indirect path selection may well be done through a metric that measures against 
(at least one) crucial path-related metric. But ultimately, the CAN provider 
selects one of possibly many IP address still, right? More importantly, it 
remains the task of the underlay routing infrastructure (again, which could 
include more than one network operator) to determine what it deems as the 
‘best’ path to each of the IP addresses (including the multi-homed S_1 
addresses).
  2.  Let’s stick with one IP address to S_1 now though but there are still at 
least two possible paths to it, where the selection of one over any of the 
other possible ones could well impact the compute-aware suitability of S_1 over 
any of the other service instances. Problem here is that ‘including the path 
selection’ would mean to impact the routing to the single S_1 IP address in a 
manner that that routing decision takes the compute-awareness into account. The 
path selection here is not indirect but direct, together with the IP address 
(i.e., service instance endpoint) selection. What is required here is that CAN 
provider and underlay somehow work together in selecting one path over another 
(to the same IP address), which in turn would mean to impact the overall 
routing decision for S_1’s IP address, which in turn would mean to impact the 
underlay routing infrastructure since the resulting (compute-aware) path 
configuration, in the form of suitable forwarding entries, needs distribution 
in the underlay infrastructure.

I think we have to be clear which of the two options we see in the CAN scope 
but also if I may have missed options here. As we can see already from those 
two options, they  have a significant impact on the architecture we may 
envision for CAN but also for its solution adoption. From my side, I have seen 
CAN mainly as an endpoint selection problem, so understood ‘path selection’ as 
an indirect one in the manner described in item 1. I just want to throw the 
options out here to solicit feedback from the community on this so that we get 
a good understanding moving forward.

Best,

Dirk

From: Dyncast [mailto:[email protected]] On Behalf Of Linda Dunbar
Sent: 15 June 2022 23:07
To: [email protected]<mailto:[email protected]>; dyncast 
<[email protected]<mailto:[email protected]>>
Cc: rtgwg <[email protected]<mailto:[email protected]>>; David R. Oran 
<[email protected]<mailto:[email protected]>>; jefftant.ietf 
<[email protected]<mailto:[email protected]>>
Subject: Re: [Dyncast] CAN BoF issues #7 #17 #32

Peng,

For Issue #32, you said: “CAN does not compute path, it selects endpoints.”

If CAN means Computing Aware Networking, it should include the path selection. 
Maybe CAN is about  Selecting (or computing) the optimal paths based on the 
combination of network conditions and the end point computing available 
resources?

My two cents,

Linda

From: Dyncast <[email protected]<mailto:[email protected]>> On 
Behalf Of [email protected]<mailto:[email protected]>
Sent: Monday, June 13, 2022 10:00 PM
To: dyncast <[email protected]<mailto:[email protected]>>
Cc: rtgwg <[email protected]<mailto:[email protected]>>; David R. Oran 
<[email protected]<mailto:[email protected]>>; jefftant.ietf 
<[email protected]<mailto:[email protected]>>
Subject: [Dyncast] CAN BoF issues #7 #17 #32

Dear All,

Here are the responses to issues #7 #17 #32, any comments are welcome!  The 
issues and responses are also copied to the questioner 
(<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdatatracker.ietf.org%2Fdoc%2Fminutes-113-can%2F&data=05%7C01%7Clinda.dunbar%40futurewei.com%7C8cd96fb4106342d0920108da502f3d79%7C0fee8ff2a3b240189c753a1d5591fedc%7C1%7C0%7C637910461104637488%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vukQ4fgbnUHyYz4biz052FN8w5f6M4yfvZuRm5CuKs8%3D&reserved=0>https://datatracker.ietf.org/doc/minutes-113-can/<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdatatracker.ietf.org%2Fdoc%2Fminutes-113-can%2F&data=05%7C01%7Clinda.dunbar%40futurewei.com%7C8cd96fb4106342d0920108da502f3d79%7C0fee8ff2a3b240189c753a1d5591fedc%7C1%7C0%7C637910461104637488%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vukQ4fgbnUHyYz4biz052FN8w5f6M4yfvZuRm5CuKs8%3D&reserved=0>)<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdatatracker.ietf.org%2Fdoc%2Fminutes-113-can%2F&data=05%7C01%7Clinda.dunbar%40futurewei.com%7C8cd96fb4106342d0920108da502f3d79%7C0fee8ff2a3b240189c753a1d5591fedc%7C1%7C0%7C637910461104637488%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vukQ4fgbnUHyYz4biz052FN8w5f6M4yfvZuRm5CuKs8%3D&reserved=0>,
 hope for further suggestions and confirmation. Thanks!

#7 This seems to assume conventional non-distributed applications just running 
at the edge. What about modern frameworks like Sapphire? and Ray? 
<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FCAN-IETF%2FCAN-BoF-ietf113%2Fissues%2F7&data=05%7C01%7Clinda.dunbar%40futurewei.com%7C8cd96fb4106342d0920108da502f3d79%7C0fee8ff2a3b240189c753a1d5591fedc%7C1%7C0%7C637910461104637488%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=iz2Q1AUQCdO2DeX0fcYQryS4Nml5IByD8roKxsXs3Ns%3D&reserved=0>
It would be good to understand the multi-site requirements of such frameworks, 
which seems to mainly run in single DCs.

#17 Whether the interests of the organization deploying the application and the 
organization providing the network connectivity are aligned. Google doesn't 
worry about this because they are 
both.<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FCAN-IETF%2FCAN-BoF-ietf113%2Fissues%2F17&data=05%7C01%7Clinda.dunbar%40futurewei.com%7C8cd96fb4106342d0920108da502f3d79%7C0fee8ff2a3b240189c753a1d5591fedc%7C1%7C0%7C637910461104637488%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=7QCYAyZ1x%2BVTyxk%2BfMATzM1GY1Qxg0UhELKkxtByHYU%3D&reserved=0>
The question is more what the scope and semantic of information is that will 
need to cross organizational boundaries. This needs further study, in 
particular when assuming stakeholder division between service and network 
provider.

 #32 How to effectively compute paths? Shall we put CPUs into account? 
<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FCAN-IETF%2FCAN-BoF-ietf113%2Fissues%2F32&data=05%7C01%7Clinda.dunbar%40futurewei.com%7C8cd96fb4106342d0920108da502f3d79%7C0fee8ff2a3b240189c753a1d5591fedc%7C1%7C0%7C637910461104637488%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=axjg4yx3TcogY5H59ruevn282CGkgWjf5BmZUNo9jY4%3D&reserved=0>
CAN does not compute path, it selects endpoints. Path selection (to a given 
endpoint) is subject to the routing at the IP underlay. For selecting 
endpoints, CPU information may be taken into account to achieve the 
'compute-awareness' that CAN strives for.

You can also add your comments to any of 
them(https://github.com/CAN-IETF/CAN-BoF-ietf113/issues<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FCAN-IETF%2FCAN-BoF-ietf113%2Fissues&data=05%7C01%7Clinda.dunbar%40futurewei.com%7C8cd96fb4106342d0920108da502f3d79%7C0fee8ff2a3b240189c753a1d5591fedc%7C1%7C0%7C637910461104637488%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Vl%2F6Y3QmiDKHVHwPP1zYlisp%2F6QtWQnwIQnR0qrd%2FgU%3D&reserved=0>).

Regards,
Peng

________________________________
[email protected]<mailto:[email protected]>

From: Linda Dunbar<mailto:[email protected]>
Date: 2022-05-11 06:11
To: [email protected]<mailto:[email protected]>
Subject: [Dyncast] Categories of the CAN BoF issues
CAN BoF proponents:

Many thanks for creating the CAN BoF issues tracking  in the Github: 
https://github.com/CAN-IETF/CAN-BoF-ietf113/issues/created_by/CAN-IETF?page=1&q=is%3Aopen+is%3Aissue+author%3ACAN-IETF<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FCAN-IETF%2FCAN-BoF-ietf113%2Fissues%2Fcreated_by%2FCAN-IETF%3Fpage%3D1%26q%3Dis%253Aopen%2Bis%253Aissue%2Bauthor%253ACAN-IETF&data=05%7C01%7Clinda.dunbar%40futurewei.com%7C8cd96fb4106342d0920108da502f3d79%7C0fee8ff2a3b240189c753a1d5591fedc%7C1%7C0%7C637910461104637488%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=dI6YFVsMi1ReHUacFsFbZQnMZpMRzg8C%2FMrbzxQgaNc%3D&reserved=0>

I went through the issues captured in the Github and characterized them into 
groups. Some issues can be lumped together for the discussion. There are quite 
a few issues related to the requirements, which need to be clarified.

Best Regards, Linda


Issues associated with Applications vs. Underlay networks:

·         Consider not to load underlay network with application details. #35

·         We have multiple upper layer application. Do we have additional needs 
for routing(e.g. WG?) or we are using those applications and won't need such 
new WG? #30

·         It needs application information too, so it can't just make a 
decision at the network layer. #23

·         This is not striked as a routing problem; it's all service discovery 
that can be done in higher layers. #21

·         3GPP and URSP solve this based on UPF selection. It uses both 
endpoint + application. #20

·         One overlay plane per application. Resources/metric specific to the 
plane. #19

·         How does the application layer or the transport layer learn the 
network status to steering traffic? #16

Need more clear requirements for CAN (to be addressed by 
draft-liu-dyncast-ps-usecases):

·         Need to understand if three are requirement to avoid extra messages 
or 1ms of latency #36

·         Regarding the flow affinity, is it from network perspective or from 
application/computation perspective? #33

·         How to effectively compute paths? Shall we put CPUs into account? #32

·         What happens when the user moves? If so we also need to move 
application context. #25

·         It can only move the services around as fast as it can update the 
routing plane. which comes back to the point about service discovery (waiting 
for convergence/distribution as opposed to just updating the SD server) #24

·         Whether the interests of the organization deploying the application 
and the organization providing the network connectivity are aligned. Google 
doesn't worry about this because they are both. #17

o    The question is more what the scope and semantic of information is that 
will need to cross organizational boundaries. This needs further study, in 
particular when assuming stakeholder division between service and network 
provider.

·         It seems impossible to satisfy that requirement simultaneously with 
the latency requirement. #15

·         It wasn't clear that how hard of a requirement session persistence 
is. #13

o    A session usually creates ephemeral state. If execution changes from one 
(e.g., virtualized) service instance to another, state/context needs transfer 
to another. Such required transfer of state/context makes it desirable to have 
session persistence (or instance affinity) as the default, removing the need 
for explicit context transfer, while also supporting an explicit state/context 
transfer (e.g., when metrics change significantly).

·         Should it select UPF based on the application? Steering is done per 
user? or per application? #9

·         This seems to assume conventional non-distributed applications just 
running at the edge. what about modern frameworks like Sapphire? and Ray? #7

o    It would be good to understand the multi-site requirements of such 
framework, which I have understood to mainly run in single DCs.

·         Relation to 3GPP UPF #6

·         Relation to ALTO #5

·         Do the mobility issues and associated protocols are also in scope? 
There are scenarios where routing alone would not be sufficient. #4

·         What is the position in the edge location regarding to UPF? #3

·         Is there some sort of authorization model so that an edge can 
indicate whether or not it will provide compute services? #2

·         What is CNC and the relationship with CAN #1


Measurement of the Computing Resources (to be addressed by 
draft-du-computing-resource-representation):

·         It is hard to use existing work to measure the computation, but we 
can optimize the latency through the performance monitoring. We have 
performance/measurement matrix over there. #34

·         Clarifications on the computing resource, its requirements and 
characteristics would be helpful. #27

·         Each application may have a different definition of "resources" these 
then have to be boiled down into a single topology Network Aware Computing 
(NAC! :) does scale #14

·         Is computing resource measurable? #10

o    It is, and how to use the measurement would be solution related. See IFIP 
Networking 2022 paper on how to simply expose “computing capability” and 
achieve better steering with such simple measure.

·         Why compute resource is different with other resources? #8

·
Load Balance based solutions:

·         The point is that we need a standardized LB protocol #18

·         The LB as part of the application itself is superior (part of the 
distributed application itself is to obtain and keep updating the "best" 
unicast location to use). #22

·         If there is anything missing from current lbs that would prevent 
their use as-is? other than there is for market reasons no interop standard 
between different lbs? #12

·         For the load balance, should it learn the network’s status? #11

·
Dyncast based Solution issues:

·         For Dyncast, when the time is short, is it possible for the router to 
decide the routing? It is too fast. #31

·         Is dyncast proposed to encapsulate? #29

·         Will CAN dyncast impact each and every router? How to avoid loops? #28

·         What's the assumed scale of a D-router? 10 ^ 6 sessions? 100^ 8? 
What's the assumed update rate? !Gb? 1Tb? #26

_______________________________________________
rtgwg mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/rtgwg

RE: CAN for Path & Service instance selection: RE: [Dyncast] CAN BoF issues #7 #17 #32

Reply via email to