Hi Venky, Which nodes get called depends on the nature of the route. Take this trivial example
DBGvpp# set int state loop0 up DBGvpp# set int ip addr loop0 10.0.0.1/24 This route is non recursive DBGvpp# ip route 1.1.1.1/32 via 10.0.0.2 loop0 This one is recursive DBGvpp# ip route 2.2.2.2/32 via 10.0.0.2 You can see the DPO graph built for them is different. DBGvpp# sh ip fib 1.1.1.1 ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto flowlabel ] epoch:0 flags:none locks:[recursive-resolution:1, default-route:1, ] 1.1.1.1/32 fib:0 index:11 locks:2 CLI refs:1 src-flags:added,contributing,active, path-list:[16] locks:2 flags:shared, uPRF-list:12 len:1 itfs:[1, ] path:[18] pl-index:16 ip4 weight=1 pref=0 attached-nexthop: oper-flags:resolved, 10.0.0.2 loop0 [@0]: arp-ipv4: via 10.0.0.2 loop0 forwarding: unicast-ip4-chain [@0]: dpo-load-balance: [proto:ip4 index:13 buckets:1 uRPF:12 to:[0:0]] [0] [@3]: arp-ipv4: via 10.0.0.2 loop0 This recursive prefix has an extra load-balanace object DBGvpp# sh ip fib 2.2.2.2 ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto flowlabel ] epoch:0 flags:none locks:[recursive-resolution:1, default-route:1, ] 2.2.2.2/32 fib:0 index:12 locks:2 CLI refs:1 src-flags:added,contributing,active, path-list:[18] locks:2 flags:shared, uPRF-list:15 len:1 itfs:[1, ] path:[20] pl-index:18 ip4 weight=1 pref=0 recursive: oper-flags:resolved, via 10.0.0.2 in fib:0 via-fib:13 via-dpo:[dpo-load-balance:14] forwarding: unicast-ip4-chain [@0]: dpo-load-balance: [proto:ip4 index:15 buckets:1 uRPF:15 to:[0:0]] [0] [@12]: dpo-load-balance: [proto:ip4 index:14 buckets:1 uRPF:14 to:[0:0]]. <<<< HERE [0] [@3]: arp-ipv4: via 10.0.0.2 loop0 The first LB in the DPO graph will be the one used in ip4-lookup the second one, when present, will be used in ip4-load-balance. The golden rule of forwarding is to do all the heavy lifting in the control-plane so you can do the absolute minimum in the DP, because down there extra work means worse performance. The load-balance bucket choice is basically choice = buckets[flow-hash & n_buckets], which is the minimum we can do. I would advise that you try and keep it this way by: 1 – choosing how the flow hash is calculated (see the list of ‘algorithms’ available in ip_flow_hash_set) 2 – choosing in the CP how the buckets are populated. If 2. Is an option we can talk more about how you might go about doing that 😊 Are you looking to change the bucket selection for all prefixes or most, some or a few? /neale From: Venky Venkatesh <vvenkat...@paloaltonetworks.com> Date: Tuesday, 7 September 2021 at 20:11 To: Neale Ranns <ne...@graphiant.com> Cc: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> Subject: Re: [vpp-dev] Question regarding ip4/ip6 nexthop load balance flexibility #vnet Hi Neale, Thanks for getting back. I thought as much that cloning code (even though I do a graph node replacement) wouldn't have been the way it would be done by someone knowledgeable in the area. When I looked again at the code, the area I intended to change seems to be a node in itself. Is this code for ECMP? (appears to be like that) VLIB_REGISTER_NODE<https://docs.fd.io/vpp/21.06/d0/da8/vlib_2node_8h.html#a025e3596085258b3bbd8e1a99ab026f4> (ip4_load_balance_node<https://docs.fd.io/vpp/21.06/dc/da9/ip4__forward_8c.html#a90b92667856f47252d614ba214ace26b>) = 270 { 271 .name = "ip4-load-balance", 272 .vector_size = sizeof (u32<https://docs.fd.io/vpp/21.06/de/deb/vppinfra_2types_8h.html#a162050b1a67fffab30498a67c9ab0f09>), 273 .sibling_of = "ip4-lookup", 274 .format_trace = format_ip4_lookup_trace<https://docs.fd.io/vpp/21.06/dc/da9/ip4__forward_8c.html#af59e7cb8282da4ff432f4237f1ab1e4a>, 275 }; For some reason it doesn't show up in https://docs.fd.io/vpp/21.06/d9/db0/nodes.html -- which is how I missed it the 1st time around. The localized changes that I was intending to do are in load_balance_get_bucket_i and load_balance_get_fwd_bucket (i.e. having computed the hash, the way I select the next hop could differ). I could also entirely outsource the hash computation and nexthop calculation to my code (in which case, also gets included in my changes). The load_balance_* code gets invoked from ip4_lookup_* and also from ip4_load_balance_*. Am I right that if I want to change the next hop selection, I would replace ip4_load_balance_* with my implementation? However I do not see this chained in the graph. Look forward to your response Thanks -Venky On Sun, Sep 5, 2021 at 6:52 AM Neale Ranns <ne...@graphiant.com<mailto:ne...@graphiant.com>> wrote: Hi Venky, There are several ways you might go about this but I would council that cloning code is a last resort – it would be a mainanence headache if you replace the current one or have i’s own set of challenges to use your replace node. if you could give me some more detail on how you want to change the next-hop selection I will gladly give you some specific advice. /neale From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> <vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>> on behalf of Venky Venkatesh via lists.fd.io<http://lists.fd.io> <vvenkatesh=paloaltonetworks....@lists.fd.io<mailto:paloaltonetworks....@lists.fd.io>> Date: Wednesday, 1 September 2021 at 16:48 To: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> <vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>> Subject: [vpp-dev] Question regarding ip4/ip6 nexthop load balance flexibility #vnet Hi, I am new to fd.io<https://urldefense.proofpoint.com/v2/url?u=http-3A__fd.io_&d=DwMF-g&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=w2W5SR0mU5u5mz008DZNCsexDN1Lr9bpL7ZGKuD0Zd4&m=bdaAaiMxrwSAEn2IzzNT7qwU8Iy_G-G-eQlabvZRaDM&s=zLAgC1KZL_0sC1trbkMPHqZrVJHEWT5__qCEOOVPJ1U&e=>. So pls pardon my ignorance. I wanted to experiment with different next hop selection algorithms once the destination IP is looked up. I looked at the ip4 code. My changes would be localized to how the nexthop is determined after the hash is calculated (currently it is going thru load_balance code). To my understanding it appears that ip4_lookup graph node has both the lookup and the next hop selection all bundled in one. If that is correct, then is the only way to try out alternate next hop selection algorithms be to create a graph node which is practically a clone of the ip4_lookup and then change the required portions in the cloned node code? If so, is this workflow in line with the way flexibility was intended? Any ideas welcome. Thanks -Venky #vnet
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#20096): https://lists.fd.io/g/vpp-dev/message/20096 Mute This Topic: https://lists.fd.io/mt/85304201/21656 Mute #vnet:https://lists.fd.io/g/vpp-dev/mutehashtag/vnet Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-