On 1/15/17, 12:07 PM, David Ahern wrote:
> IPv4 allows multipath routes to be deleted using just the prefix and
> length. For example:
>     $ ip ro ls vrf red
>     unreachable default metric 8192
>     1.1.1.0/24
>         nexthop via 10.100.1.254  dev eth1 weight 1
>         nexthop via 10.11.200.2  dev eth11.200 weight 1
>     10.11.200.0/24 dev eth11.200 proto kernel scope link src 10.11.200.3
>     10.100.1.0/24 dev eth1 proto kernel scope link src 10.100.1.3
>
>     $ ip ro del 1.1.1.0/24 vrf red
>
>     $ ip ro ls vrf red
>     unreachable default metric 8192
>     10.11.200.0/24 dev eth11.200 proto kernel scope link src 10.11.200.3
>     10.100.1.0/24 dev eth1 proto kernel scope link src 10.100.1.3
>
> The same notation does not work with IPv6 because of how multipath routes
> are implemented for IPv6. For IPv6 only the first nexthop of a multipath
> route is deleted if the request contains only a prefix and length. This
> leads to unnecessary complexity in userspace dealing with IPv6 multipath
> routes.
>
> This patch allows all nexthops to be deleted without specifying each one
> in the delete request by passing a new flag, RTM_F_ALL_NEXTHOPS, in
> rtm_flags. Internally, this is done by walking the sibling list of the
> route matching the specifications given (prefix, length, metric, protocol,
> etc).
>
> With this patch (and an updated iproute2 command):
>     $  ip -6 ro ls vrf red
>     2001:db8::/120 via 2001:db8:1::62 dev eth1 metric 256  pref medium
>     2001:db8::/120 via 2001:db8:1::61 dev eth1 metric 256  pref medium
>     2001:db8::/120 via 2001:db8:1::60 dev eth1 metric 256  pref medium
>     2001:db8:1::/120 dev eth1 proto kernel metric 256  pref medium
>     ...
>
>     $ ip -6 ro del vrf red 1111::1/120
>     $ ip -6 ro ls vrf red
>     2001:db8:1::/120 dev eth1 proto kernel metric 256  pref medium
>     ...
>
> The flag is added to fib6_config by converting fc_type to a u16 (as
> noted fc_type only uses 8 bits). The new u16 hole is a bitmap with
> fc_delete_all_nexthop as the first bit.
>
> Suggested-by: Dinesh Dutt <dd...@cumulusnetworks.com>
> Signed-off-by: David Ahern <d...@cumulusnetworks.com>
> ---
> v2
> - switched example to rfc 3849 documentation address per request
> - changed delete loop to explicitly look at siblings list for
>   first route matching specs given (metric, protocol, etc)
>
>  include/net/ip6_fib.h          |  4 +++-
>  include/uapi/linux/rtnetlink.h |  1 +
>  net/ipv6/route.c               | 28 +++++++++++++++++++++++++---
>  3 files changed, 29 insertions(+), 4 deletions(-)
>
> diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
> index a74e2aa40ef4..11ab99e87c5f 100644
> --- a/include/net/ip6_fib.h
> +++ b/include/net/ip6_fib.h
> @@ -37,7 +37,9 @@ struct fib6_config {
>       int             fc_ifindex;
>       u32             fc_flags;
>       u32             fc_protocol;
> -     u32             fc_type;        /* only 8 bits are used */
> +     u16             fc_type;        /* only 8 bits are used */
> +     u16             fc_delete_all_nexthop : 1,
> +                     __unused : 15;
>  
>       struct in6_addr fc_dst;
>       struct in6_addr fc_src;
> diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
> index 8c93ad1ef9ab..7fb206bc42f9 100644
> --- a/include/uapi/linux/rtnetlink.h
> +++ b/include/uapi/linux/rtnetlink.h
> @@ -276,6 +276,7 @@ enum rt_scope_t {
>  #define RTM_F_EQUALIZE               0x400   /* Multipath equalizer: NI      
> */
>  #define RTM_F_PREFIX         0x800   /* Prefix addresses             */
>  #define RTM_F_LOOKUP_TABLE   0x1000  /* set rtm_table to FIB lookup result */
> +#define RTM_F_ALL_NEXTHOPS   0x2000  /* delete all nexthops (IPv6) */
>  
Do we really need the flag ?. It seems like delete with just prefix should 
delete all the routes in a multipath
route by default... (understand that you have it there to preserve existing 
behavior...for people who maybe relying on it. But this seems more like a bug 
fix. route replace went through a few such bug fixes "ipv6: fix ECMP route 
replacement"). ok with either approach.

Reply via email to