On 22.07.2011 14:52, Ondrej Zajicek wrote:
On Fri, Jul 22, 2011 at 01:47:14AM +0400, Alexander V. Chernikov wrote:
Therefore there would be two types of routing tables - IP and MPLS. I
don't think it is a good idea to mix these. This may look inconsistent
with idea of embedding IPv4 to IPv6, but IP protocols are much more
similar, have a natural way to embed one in the other, have similar
roles and protocol structure. MPLS routing table could be used to LDP -
kernel interaction (routes imported from LDP and exported to kernel).
This solves your Case 2 without any hacks.
So, from user point of view, I define
table xxx; for both ipv4 and IPv6 routes and
mpls table yyy; for MPLS routing table?

Yes.

There should be base MPLS rtable (mpls_default, for example) as in IP.
We can also add a hack for automatically subscribe protocols for MPLS
routing table by type and other attributes. For example, every LDP
instance gets connected to an MPLS table (default or defined in config).
Kernel protocol instance gets connected to MPLS table only if its IP
table is the default one (GRT) or 'mpls table' keyword is supplied
explicitely. What about VPNv4/VPNv6 ? The same approach?

Perhaps even default MPLS table should be explicitly configured [*] (as i guess
not many BIRD users would use MPLS). Protocols requiring MPLS table would
fail if it is not configured, protocol with optional MPLS support (kernel,
static?) just do not connect to MPLS in that case. The same approach
for VPNvX table.

[*] probably like: mpls table XXX default;
Maybe it's better to turn on "general" mpls support?
e.g. 'mpls support;' or just 'mpls;' instead of propagating some table to be default?

  Btw, how we will distinguish inet/inet6 rtes? (I'm talking about MP-BGP
/ IPv4-mapped cases)

I planned to use IPv4-mapped prefix (::ffff:0:0/96), which is used for
similar purposes in IP stack. But this should not be checked directly
in protocols, there should be some macros in lib/ipv6.h for that.

[*] when i wrote that i thought that labels are distributed just by LDP
and the purpose of label request is to propagate the label through LDP
area. i didn't noticed that BGP/MPLS also distributes labels so they
need to know assigned labels. So the idea would need some modifications.
Not sure this will work. Since t1 is an IP table cases when we need to
request specific label for:
* AToM
* RSVP-TE tunnels
will not work since there are no prefixes that can be mapped to such
request.

You are probably right. I originally thought about some specific
'request table' (where requests coded as routes with specific AF),
but perhaps there should be used some other mechanism / other protocol
hook. But it should be generic enough (some bus, allows at least more
'producers' and perhaps more 'consumers').
Okay, i see this as follows:
New rtable hook, service_hook, with uint32_3 bitmask specifying request classes we are responsible to:
/* Defined classes */
#define RCLASS_LABEL 0x01 /* MPLS label request */

Some request function:
int
request_data(rtable *t, struct service_request *req, void **buf, size_t *bufsize)

struct service_request {
   uint32_t    request; /* Single request class set */
   uint32_t    subclass; /* Subclass specific for request */
   proto       *p; /* caller protocol */
   char        data[0]; /* request-specific data follows */
}

function loops thru all registered hooks for given _class_ checking for reply until SR_OK or SR_FAIL is returned. It is up to protocol hook to check subclass.
#define SR_OK      0x01 /* Request successful */
#define SR_FAIL    0x02 /* Request failed */
#define SR_NEXT    0x03 /* Request skipped */
#define SR_UNAVAIL 0x04 /* No providers for this request */

As a result, caller get SR_UNAVAIL in case of no providers were able to serve request or SR_OK|SR_FAIL.

caller can setup buffer itself and pass pointer to pointer to buffer and pointer to buffer size to function, or request provider to allocate data for him setting *buf to NULL and bufsize to 0

struct service_reply { /* is returned in reply buffer */
  uint32_t    request;
  uint32_t    subclass;
  proto       *p; /* protocol, providing data */
  char        data[0]; /* request-specific data */
}




Internal LMAP table is examined, tracked IGP table is examined. If both
are ready (for given prefix), appropriate encapsulating and MPLS routes
are generated and propagated using rte_update(), otherwise nothing is
generated and the previously generated route is withdrawn (rte_update()
with NULL is called) (or perhaps an unreachable route is generated if
LMAP is here but IGP route is missing). Simple and elegant.
.. and in case of label release we should remove label only and keep
original route

Yes.

There are some tricky parts of IGP tracking - it is problematic
to use standard RA_OPTIMAL update for this purpose, because if
generated encapsulating routes are imported to the same table,
these probably became the optimal ones and IGP routes would be
shaded. Solution would be to use RA_ANY, and ignore notifications
containing encapsulating routes, similarly 'examining the tracked
IGP table' means looking up the fib node and find the best route,
ignoring encapsulating ones.

For implementation of this behavior, there are two minor changes that
needs to be done to the rt table code: First, currently accept_ra_types
(RA_OPTIMAL/RA_ANY) is a property of a protocol, it needs to be a
property of an announce hook (as LDP would have two hooks with
RA_OPTIMAL and one hook with RA_ANY). Second, rte_announce() for
both in rte_recalculate should be moved after the route list
is updated/relinked.

Agreed. Distinguishing RA_OPTIMAL and RA_ANY in current code is not a
trivial task and requires internals understanding. Either announce type
should be passed to announce hook or new hook should be added for RA_ANY
  event. The latter is more appropriate IMHO since RA_ANY is used by pipe
protocol only.

I thought about that when i created RA_ANY and have chosen this approach.
Probably best way is just to change rt_notify to have appropriate
struct announce_hook as a second argument instead of struct rtable.
struct announce_hook would contain RA_ANY/RA_OPTIMAL and possibly
some protocol-specific data. As (probably) all protocols are in-tree,
doing some wide but trivial changes is not a problem.

Kernel protocol should track RA_ANY protocol hooks
looking for update source (LDP / RSVP) and re-install appropriate
routes.

I think kernel protocol should use RA_OPTIMAL as usual. This kind
of RA_ANY usage is for protocols that export routes to the same
table they listen (so 'source' routes would be shaded by their
routes). These routes (LDP / RSVP) should have just highest
priority.

The only downside is situation when LDP signalling starts faster
than IGP. In that case we will get 3 updates instead of one (at least in
RTSOCK):
* RTM_ADD for original prefix
* RTM_DEL for this prefix (as part of krt_set_notify())
* RTM_ADD for modified prefix

RTM_CHANGE can be used in notify, but still: this gives 2 updates
instead of one.

No, because RA_ANY is handled strictly before RA_OPTIMAL and routes
are propagated synchronously depth-first:

OSPF --RA_ANY-->  LDP
LDP --RA_OPTIMAL-->  kernel
OSPF --RA_OPTIMAL-->  kernel

Still I can't understand how exactly I can modify an announced IP route (still, from FreeBSD kernel point of view encapsulated route is a usual route with an attribute attached. From Linux point of view this should be more or less the same since an IP route lookup have to be done for incoming packet anyway and doing several different lookups is not a best idea). I've got RA_ANY hook called for a new route (and I should know that it is actually RA_OPTIMAL without some complex logic!), what I should do next ?

But it is true that this is much dependent on internal implementation
of route propagation. The first idea i had was to use separate
tables for original and labeled routes (when just RA_OPTIMAL hooks),
but that looks too cumbersome for users and ability to push a better
route to the same (input) table has other possible usages.

Therefore, it is probably a good idea to extend FIBs in a way you
suggested, with minor details changed. FIB / rtables would be uniform
(AF_ bound), but there are just three AFs (IP, MPLS, VPN) - IPv4 and IPv6
could be handled as one AF, embedded, the same for VPNv4 and VPNv6). To
minimize code changes, struct fib_node would have ip_addr prefix, but
might be allocated larger.
Okay, so for IPv4+IPv6-enabled daemon we will allocate an ip_addr large
enough for holding IPv6 address? This can bump memory consumption for
setups with several full-views significantly.

It increases memory consumtion, but not so much in a relative view - for
each struct network there is at least one struct rte and in both of them
there is just one ip_addr and both structures are nontrivial. So this
relative increase would be about 1.15-1.2. Really big users would
probably keep current splitted setting.
Okay, it's much easier from developer point of view. If you're not afraid of your users :)

Because each protocol and each its announce_hook have appropriate role,
it is IMHO unnecessary to have AF in protocol hooks, but there should be
check whether protocol/announce_hook is connected to appropriate rtable.


To summarize required changes (please correct me):
1) Differentiate between RA_ANY and RA_OPTIMAL (new hook, possibly)
2) Add 3 AFs (AF_IP, AF_MPLS, AF_VPN) to the following structures:
* rtable
* fib
* rte
3) Add fib2_init with sizeof(AF object) supplied. Add appropriate field
to struct fib to hold this value.
4) Move to memcmp() in fib_find / fib_get
5) Set up default rtable for every supported AF. Connect protocol
instances to such default AFs based on protocol types

1a) other changes in rte_recalculate() related to propagation
(clean up the table before calling RA_ANY hook).

1) and 1a) i will do myself and send you the patch, and also make
some trivial example for exporting to the same table.

2) i am not sure if there is a reason to put explicit AF info
to struct fib, AF compatibility could be handled on higher level
(struct rtable in general, other direct users probably use just
one AF).
No problem, I misinterpreted "FIB / rtables would be uniform (AF_ bound)" as "FIB / rtable needs AF infor in structure fields"

3) and hashing callback (and perhaps fib_route, but not sure if this is
needed).

4) probably encapsulate that to some static inline key_equal() function.

5) see my related note above. Protocol binding to tables should check AFs.

more:

6) RTD_MPLS in dest field, struct rta_mpls, as i wrote in the previous mail:

i think encapsulation
routes should be represented by routes with new destination type
(RTD_MPLS in dest field of struct rta) and whole NHLFE should be stored
in new struct rta_mpls (or rta_nhlfe), which would be extension of
struct rta (containing struct rta in the first field and NHLFE after
that). Such structure could be easily passed as struct rta and functions
from rt-attr.c can work with that, with jome some minor modifications
(allocating, freeing and printing) dispatched based on dest field.

This rta could be used without changes also for MPLS routes.

I'll try to send you patches for all these as I see it in several days.


Most of this are more or less trivial changes not MPLS-bound (VPNv4/6
can be used in case of bird used as RR in MPLS network, for example).
Should I supply patches for these? What are your plans about commit
routemap ?

I create GIT branch 'mpls' and would merge these patches to that branch
soon. When we will have some major release, we could merge 'mpls' branch
to master if there is some sufficient usage (i think that even just
static and kernel protocol support for MPLS would be a good example
usage). Other protocols (LDP, ...) probably should be merged when they
are reasonable ready.
Will this branch available from official git repo ? It is not accessible (from its web interface at least).


Btw, some bird/LDP "status" report:

bird> show ldp neighbour
    Peer LDP Ident: 10.2.33.4:0; Local LDP Ident 10.0.0.88:0
         TCP connection: 10.2.33.4.11212 - 0.0.0.0.0
         State: Operational; Msgs sent/rcvd: 21/61; Downstream
         Up time: 00:02:27
         LDP discovery sources:
           em1, Src IP addr: 10.1.5.4
    Peer LDP Ident: 10.2.33.3:0; Local LDP Ident 10.0.0.88:0
         TCP connection: 10.2.33.3.11009 - 0.0.0.0.0
         State: Operational; Msgs sent/rcvd: 29/60; Downstream
         Up time: 00:02:20
         LDP discovery sources:
           em2, Src IP addr: 10.1.6.3
bird> show ldp bindings
  lib entry: 10.2.0.0/30
      local binding:  label: 25
      remote binding:  lsr: 10.2.33.4:0, label: ImpNULL
      remote binding:  lsr: 10.2.33.3:0, label: 23
  lib entry: 10.1.6.0/24
      remote binding:  lsr: 10.2.33.3:0, label: ImpNULL
      remote binding:  lsr: 10.2.33.4:0, label: 25
  lib entry: 10.0.0.0/24
      remote binding:  lsr: 10.2.33.3:0, label: 19
      remote binding:  lsr: 10.2.33.4:0, label: 23
  lib entry: 10.2.0.2/32
      local binding:  label: 26
      remote binding:  lsr: 10.2.33.4:0, label: 16
      remote binding:  lsr: 10.2.33.3:0, label: 24
  lib entry: 10.1.4.0/24
      local binding:  label: 29
      remote binding:  lsr: 10.2.33.4:0, label: ImpNULL
      remote binding:  lsr: 10.2.33.3:0, label: ImpNULL
  lib entry: 10.1.5.0/24
      remote binding:  lsr: 10.2.33.4:0, label: ImpNULL
      remote binding:  lsr: 10.2.33.3:0, label: ImpNULL
  lib entry: 1.2.3.5/32
      remote binding:  lsr: 10.2.33.3:0, label: 20
      remote binding:  lsr: 10.2.33.4:0, label: 21
  lib entry: 10.1.33.0/24
      local binding:  label: 28
      remote binding:  lsr: 10.2.33.4:0, label: ImpNULL
      remote binding:  lsr: 10.2.33.3:0, label: ImpNULL
  lib entry: 10.2.33.3/32
      local binding:  label: 31
      remote binding:  lsr: 10.2.33.3:0, label: ImpNULL
  lib entry: 10.2.33.4/32
      local binding:  label: 27
      remote binding:  lsr: 10.2.33.4:0, label: ImpNULL
      remote binding:  lsr: 10.2.33.3:0, label: 25
  lib entry: 10.1.6.88/32
      remote binding:  lsr: 10.2.33.3:0, label: 18
      remote binding:  lsr: 10.2.33.4:0, label: 19
  lib entry: 10.0.0.88/32
      remote binding:  lsr: 10.2.33.4:0, label: 17
      remote binding:  lsr: 10.2.33.3:0, label: 16
  lib entry: 10.1.5.88/32
      remote binding:  lsr: 10.2.33.3:0, label: 21
      remote binding:  lsr: 10.2.33.4:0, label: 18
bird> show ldp forwardingtable
Local  Outgoing       Prefix             Bytes Label    Outgoing   Next Hop
Label  Label or VC    or Tunnel Id       Switched       interface
20 SWAP 10.2.0.0/30 0 ? 10.1.5.4 21 SWAP 10.2.0.2/32 0 ? 10.1.5.4 22 SWAP 10.2.33.4/32 0 ? 10.1.5.4 23 SWAP 10.1.33.0/24 0 ? 10.1.5.4 24 SWAP 10.1.4.0/24 0 ? 10.1.5.4 25 SWAP 10.2.0.0/30 0 ? 10.1.5.4 26 SWAP 10.2.0.2/32 0 ? 10.1.5.4 27 SWAP 10.2.33.4/32 0 ? 10.1.5.4 28 SWAP 10.1.33.0/24 0 ? 10.1.5.4 29 SWAP 10.1.4.0/24 0 ? 10.1.5.4 30 SWAP 10.2.33.3/32 0 ? 10.1.6.3 31 SWAP 10.2.33.3/32 0 ? 10.1.6.3




Reply via email to