Attached is a txt file that contains all your comments and our
responses. Please go over them. Also we need clarification on these:
thiru-10a, thiru-30, thiru-31, thiru-33
Sangeeta
Thanks, here are the clarifications and a few others.
Thirumalai
--------------------------------------------------------------------------
thiru-10a ipfil_sendpkt: If ipfil_sendpkt -> ire_route_lookup happens for a
destination that is onlink ire_route_lookup() would return an IRE_INTERFACE
and sire would be null. The check at L731 in ire_forward must check whether
supplied_ire is null to determine the caller.
RESP: CLARIFY. You are correct in that L731 in ire_forward is incorrect.
But sire may null ( or onlink host) and a non-null for offlink host. So
should L731 just check for a non-null ire(and not check on sire at all) to
determine that the caller of ire_forward is ipfil_sendpkt instead?
Yes, that is right.
--------------------------------------------------------------------------
thiru-21 ire_get_bucket() assignment at L1047 is not required. This
actually happens in rn_newpair() at L495 radix.c. rdst and
There are 2 parts to my comment. The first part above is needed for
avoiding redundancy and confusion. The 2nd part below is a style issue,
so I will leave it to you. In both cases we don't expect any measurable
performance benefits.
initialization at L1030 - L1034 is not needed, instead
rt->rt_dst can be directly initialized at L1048.
RESP:INVESTIGATE Will see if above suggestion provides any perf benifits
--------------------------------------------------------------------------
thiru-22 ire_find_best_route() - Should not loop across the dupedkey chain.
That loop happens in rn_match_args().
RESP: REJECT
This is needed when, for example, we have an interface route
with a /26 mask, and we do
# route add 192.168.81.0/24 192.168.81.1
Both routes are saved under the same radix node. When we do
Not really. rn_addroute -> rn_insert() returns duplicate, rn_addroute()
then walks down the dupedkey list, inserts the new radixnode and returns it.
Thus we have 2 radix nodes linked via the dupedkey chain and with each radix
node pointing to the appropriate route. Please see the figure below in thiru-27.
ire_ftable_lookup we don't pass a mask to rn_match, so it
You mean rn_match_args, and not rn_match. Also
ip_rt_delete -> ire_ftable_lookup does pass a mask for match.
will only look at the head of the dupedkey list for a match. So,
if we now try to delete
# route delete 192.168.81.0/24 192.168.81.1
the delete will fail (because ire_match_args is only called for
the /26 route, with your suggestion).
Doesn't appear that way to me. But I could be easily missing something, since
Iam working blind-folded just based on code inspection. So let me go through
the above example.
We have 2 routes in the system corresponding to
# route add 192.168.81.0/26 192.168.81.1
# route add 192.168.81.0/24 192.168.81.1
The leaf node for 192.168.81.0/26 has its rn_dupedkey pointing to the
radix node for 192.168.81.0/24. (Fig thiru-27)
Now we try
# route delete 192.168.81.0/24 192.168.81.1
This will result in a lookup for 192.168.81.0/24. At L379 radix.c,
RN_MATCHF will be called on the leaf node for 192.168.81.0/26. This will fail.
Now we go down to 'keeplooking' and then enter the for loop at L406 where
we match each node in the dupedkey list and call RN_MATCHF on each of them
at L415. So you don't need the dupedkey chain logic in ire_find_best_route().
And if you really need that logic in ire_find_best_route() it is bad,
because all the radix tree implementation logic is hidden from IP and is
(should be) buried completely in radix.c.
However if you actually try the above on a test machine after removing
the dupedkey walk from ire_find_best_route(), I would expect it to fail
because of bug thiru-27 which needs to be fixed as a pre-requisite.
--------------------------------------------------------------------------
thiru-27. The code in rn_match_args L382 down should be as follows.
/*
* Although we found an exact match on the key, rn_leaf_fn
* is looking for some other criteria as well. Continue
* looking as if the exact match failed.
*/
b = 0;
goto keeplooking;
}
keydiff:
test = (*cp ^ *cp2) & 0xff; /* find first bit that differs */
for (b = 7; (test >>= 1) > 0; )
b--;
keeplooking:;
matched_off = cp - v;
b += matched_off << 3;
rn_bit = -1 - b;
The search key and the route matched completely in all bits upto
the netmask of the route, otherwise we would have gone to
'keydiff'. So the matched offset is really cp - v. The parent's
rn_offset represents the byte in which the search key is being
tested, and that is not relevant.
RESP: INVESTIGATE - will investigate the optimization suggestion.
Note that it is a correctness issue, and not an optimization. Let us go
through the same example as thiru-22. The tree looks as follows.
o Root
/ \
/ \
/ \
zero o P
/ \
/ \
/ \
R1 ones
|
|
R2
R1 - 192.168.81.0/26 (address 0xC0A85100), R1.rn_bit = -59 [-1 - 26](bias -32)
R2 - 192.168.81.0/24 R2.rn_bit = -57 (-1 -24) [bias -32]
P.rn_offset corresponds to the 1st byte of R1's address viz 0xC0.
P.rn_bit corresponds to the 1st zero bit in 0xC0 starting from the MSB.
With offsets starting from sockaddr_in,
P.rn_offset = 4, P.rn_bit = 34
Consider the lookup for 192.168.81.0/24. We hit R1, the for loop at L367
completes fully without mismatch on R1. Now rn_match_args() -> RN_MATCHF at
L379 fails since the mask does not match, so we get into the else at L387
and reset b to 0, matched_off = 4, we do down to 'keeplooking' and now
b = 32, rn_bit = -33 at L399. In the for loop down, the comparison at L414
will fail since rn_bit is > t->rn_bit, where t is each of the nodes in
the dupedkey list i.e. R1, R2 etc.
Instead consider the code fragment in thiru-27. Now matched_off is 8,
b = 64, rn_bit = -65. Thus the comparison at L414 will succeed for all
nodes in the dupedkeylist and the result will only depend on the RN_MATCHF.
--------------------------------------------------------------------------
thiru-30 Given the function pointer table rnh_* for the
lookup/add/delete operations, it is not good to also have
direct calls to parallel lookup functions that are not listed
in the table. rn_match is the table driven function. But
rn_match_args is used privately by IP. One solution is to add
another operation say rnh_matchaddr_args to the function table.
RESP: CLARIFY What would the extra level of indirection buy us?
Modularity, abstraction / encapsulation and uniformity.
I believe you want to hide the entire radix tree related code and data
structures as an abstraction to clients and let clients access the radix
tree via well defined entry points which are routed through a function table
pointer mechanism rather than letting them make direct calls into the
radix tree implementation.
--------------------------------------------------------------------------
thiru-31 ire_forward: The block at L876 for the IRE_CACHE is not very
useful. If the interface route is deleted the ire cache will
also be eventually deleted through the ihandle linkage. Instead
we should return much earlier in block L794 down, irrespective
of whether the nce_state is reachable or not.
RESP: CLARIFY.
Are you suggesting that the ire_ihandle_lookup_offlink() at L882 is
useless or that it should be moved up somewhere else?
Iam suggesting it can be removed. It is needed only if you want to create
a long lived IRE_CACHE for the offlink destination based on the IRE_CACHE
of the gateway. But that is not the case.
thiru-33 It is good to avoid dropping packets in the outbound side and instead
queue them and enable flow control if canputnext fails. In ip_wput_ire()
we check for canputnext at label blocked: and subsequently just do putnext.
Now the replacement of putnext with ip_xmit_v4, causes another check for
canputnext and then drop the packet. Instead ip_xmit_v4() must be passed
another boolean parameter. If this is true, we do the canputnext check
in ip_xmit_v4 and drop the packet (eg. forwarding path). If it is false
we just call putnext. (eg. ip_wput_ire path).
RESP: CLARIFY
There are two calls to ip_xmit_v4() in ip_wput_ire(0: one is at L27105
before the "blocked" label at 20804, the other is after it at L21118
Furthermore there is canput logic in ip_xmit_v4() itself in L27027
Are you suggesting the following changes?:
o We add another bootlean arg (lets call it "canputnext_check") to
ip_xmit_v4()
and change code at ip_xmit_v4() such that if this new arg is B_TRUE then
we
do the canputnext at L27027 of ip_xmit_v4, if B_FALSE the skip the
caputnext check
and just do a putnext.
o Have all callers of ip_xmit_v4() set it as B_TRUE *except* the two
callers of ip_xmit_v4() from ip_wput_ire(L27105, 21118)
I suppose this way we wont be doing double canput check in:
ip_wput_ire->ip_xmit_v4
Yes, that is correct.
but other than the redundancy, is there a flaw in doing canput check twice?
You would end up dropping packets instead of flow controlling and blocking
the application (unlike onnv) in case of say UDP apps.
--------------------------------------------------------------------------
_______________________________________________
networking-discuss mailing list
[email protected]