Before I resubmit this, let us see if anyone has an opinion regarding
the following:
As you might have noticed, I moved the initialization of nbr_self to a
later point until
all of the interface parameters are initialized, in ospfd.c when we add
interfaces to ospf.
I also added an assertion to delete neighbor function to make sure we
are not freeing
a neighbor structure with live references to it.
I was wondering if I have to delay the initialization of nbr_self even
further until we
get an interface up event. Given that there is a chance the router id
might not be initialized
yet, or more interface configurations commands, nbr_self might be reset
anyway.
There are two ways to do this. reset nbr_self every time there is an
up/down event,
or initialize it once and keep it set until there is an explicit even
that requires resetting
it, such as changing the router id or changing the link type or address.
There are several spots
in the code that need to be made aware of the fact that nbr_self could
be NULL, but that
should be straight forward. So, the options to move forward with this
are:
1- consider the patch enough as it stands
2- initialize nbr_self with the first interface up event and
keep it set as long as we have everything to keep it set
3- set self_nbr to NULL with interface down events, and reset with up
events.
Cheers.
Jafar
On 2016-04-15 18:15, Jafar Al-Gharaibeh wrote:
ospfd keeps a list of neighbor routers for each configured interface.
This list is indexed using the neighbor router id in case of
point-to-point and virtual link types, otherwise the list is indexed
using the source IP address of the neighbor on the link. This conforms
to RFC 2328 (page 96). The router adds itself as a "pseudo " neighbor
on each link, and also keeps a pointer called (nbr_self) to this
neighbor structure. This takes place when the interface is first
configured.
ospfd adds this pseudo neighbor before the link parameters are fully
configured, including whether the link type is point-to-point or
virtual
link. The causes the pseudo neighbor to be always indexed using the
source
address regardless of its link type. For point-to-point and virtual
links,
this causes the lookup for the pseudo neighbor to always fail because
the
lookup is done using the router id whereas the neighbor was added using
its
source address. This becomes really problematic if there is a state
change
that requires a rebuild of nbr_self, changing the router id for
example.
When resetting nbr_self, the router first tries to remove the pseudo
neighbor
form its neighbor list on each link by looking it up and resetting any
references to it before freeing the neighbor structure. since the
lookup
fails to retrieve any references in the case of point-to-point and
virtual
links the neighbor structure is freed leaving dangling references to
it.
Any access to the neighbor list after that is bound to stumble over
this
dangling pointer causing ospfd to crash.
Signed-off-by: Jafar Al-Gharaibeh
---
diff --git a/ospfd/ospf_interface.c b/ospfd/ospf_interface.c
index f4242b0..d54bc47 100644
--- a/ospfd/ospf_interface.c
+++ b/ospfd/ospf_interface.c
@@ -232,8 +232,8 @@ ospf_if_new (struct ospf *ospf, struct interface
*ifp, struct prefix *p)
/* Set default values. */
ospf_if_reset_variables (oi);
- /* Add pseudo neighbor. */
- oi->nbr_self = ospf_nbr_new (oi);
+ /* Set pseudo neighbor to Null */
+ oi->nbr_self = NULL;
oi->ls_upd_queue = route_table_init ();
oi->t_ls_upd_event = NULL;
@@ -902,7 +902,9 @@ ospf_vl_new (struct ospf *ospf, struct
ospf_vl_data *vl_data)
if (IS_DEBUG_OSPF_EVENT)
zlog_debug ("ospf_vl_new(): set associated area to the backbone");
- ospf_nbr_add_self (voi);
+ /* Add pseudo neighbor. */
+ ospf_nbr_self_reset (voi);
+
ospf_area_add_if (voi->area, voi);
ospf_if_stream_set (voi);
diff --git a/ospfd/ospf_neighbor.c b/ospfd/ospf_neighbor.c
index 862de5e..86ab442 100644
--- a/ospfd/ospf_neighbor.c
+++ b/ospfd/ospf_neighbor.c
@@ -181,6 +181,31 @@ ospf_nbr_delete (struct ospf_neighbor *nbr)
route_unlock_node (rn);
}
+ else
+{
+ /*
+ * This neighbor was not found, but bfore we move on and
+ * free the neighbor structre, make sure that it was not
+ * indexed incorrectly and ended up in the "worng" place
+ */
+
+ /* reverse the lookup rules */
+ if (oi->type == OSPF_IFTYPE_VIRTUALLINK ||
+ oi->type == OSPF_IFTYPE_POINTOPOINT)
+p.u.prefix4 = nbr->src;
+ else
+p.u.prefix4 = nbr->router_id;
+
+ rn = route_node_lookup (oi->nbrs, &p);
+ if (rn){
+/* We found the neighbor!
+ * now make sure it is not the exact same neighbor
+ * structure that we are about to free
+ */
+assert (nbr != rn->info);
+ }
+ route_unlock_node (rn);
+}
/* Free ospf_neighbor structure. */
ospf_nbr_free (nbr);
@@ -207,7 +232,9 @@ ospf_nbr_bidirectional (struct in_addr *rou