Re: Improving use of rt_refcnt

2015-07-15 Thread Ryota Ozaki
On Sat, Jul 4, 2015 at 9:52 PM, Ryota Ozaki  wrote:
> Hi,
>
> I'm trying to improve use of rt_refcnt: reducing
> abuse of it, e.g., rt_refcnt++/rt_refcnt-- outside
> route.c and extending it to treat referencing
> during packet processing (IOW, references from
> local variables) -- currently it handles only
> references between routes. The latter is needed for
> MP-safe networking.
>
> Here is a patch:
> http://www.netbsd.org/~ozaki-r/reduce-rt_refcnt-abuse.diff

BTW are there objections against the patch as a cleanup
of use of rt_refcnt?

  ozaki-r

>
> The patch passes all ATF tests and an additional test
> specific to refcnt that is not committed yet due
> to lack of refcnt outputs in netstat -r (See PR#50027).
>
> Note that the patch isn't ready for MP-safe yet;
> we need atomicity of refcnt operations somehow
> (locking is an easy solution for it). This work
> will be solved in another patch.
>
> Any comments or suggestions are welcome.
>
> Thanks,
>   ozaki-r


Re: Improving use of rt_refcnt

2015-07-15 Thread Ryota Ozaki
On Fri, Jul 10, 2015 at 6:10 AM, Joerg Sonnenberger
 wrote:
> On Sun, Jul 05, 2015 at 10:33:02PM -0700, Dennis Ferguson wrote:
>> If you don't want it to work this way then you'll need to replace the
>> radix tree with something that permits changes while readers are
>> concurrently operating.  To take best advantage of a more modern data
>> structure, however, you are still not going to want readers to ever
>> write the shared data structure if that can be avoided.  The two
>> atomic operations needed to increment and decrement a reference count
>> greatly exceed the cost of a (well-cached) route lookup.
>
> Let me pick the discussion up at this point since David mentioned that
> my last reply was somewhat terse. I think the current radix tree serves
> three different purposes right now:
>
> (1) Manage the view of the connectivity to the outside world in a way
> coherent with the administrator's intention or some routing
> protocol/daemon.
>
> (2) Provide a mechanism for finding the next-hop for traffic to not
> directly attached networks.
>
> (3) Provide a mechanism for finding L2 addresses on directly attached
> networks.
>
> Using a single data structure for this has the advantage of code sharing
> and can make detailed accounting very easy. It has the problem of
> overhead and mixing data of different levels of volatility. I would like
> to see the three mechanisms to be separated with appropiate data
> structures for each case. The first point would be moved out completely
> from the hot path, the actual packet handling case. It would then be no
> longer as performance sensitive, so options for storage can be more
> focused on size.
>
> For finding the next-hop, the problem is simplified. The number of
> next-hop addresses is (normally) limited by the size of the network
> neighborhood. Even a backend router at one of the major Internet
> exchange points will not have more than a few thousand next-hops,
> compared to having 200k routes or more. This can be exploited to reduce
> the data size of the BMP lookup data structure and by removing redundant
> entries, e.g. a longer prefix with the same next-hop as a shorter
> prefix. As I mentioned in my earlier email, the next-hop entry can and
> should store a reference to whatever L2 data is needed, so that no
> additional search is needed.
>
> For the L3->L2 address mapping, the problem changes from BMP search to
> an exact match search. If the mapping is managed correctly, it makes
> sense to do this (cheap) search first and skip the whole BMP lookup on
> a match as redundant. Hash tables and the like have also nice properties
> for read-mostly updates and cache density.
>
> Joerg

http://www.netbsd.org/~ozaki-r/separate-l2-nexthop.diff

I've written a POC patch toward the nexthop cache separation.
It is mostly based on FreeBSD implementation.

It's still not mature; there remain debug codes and many places
are probably broken. But it works anyway, at least it passes
most of ATF tests.

This is a demonstration to show what we need to do if we go
the direction. I don't intend to commit it soon.

  ozaki-r


Re: Improving use of rt_refcnt

2015-07-09 Thread Joerg Sonnenberger
On Sun, Jul 05, 2015 at 10:33:02PM -0700, Dennis Ferguson wrote:
> If you don't want it to work this way then you'll need to replace the
> radix tree with something that permits changes while readers are
> concurrently operating.  To take best advantage of a more modern data
> structure, however, you are still not going to want readers to ever
> write the shared data structure if that can be avoided.  The two
> atomic operations needed to increment and decrement a reference count
> greatly exceed the cost of a (well-cached) route lookup.

Let me pick the discussion up at this point since David mentioned that
my last reply was somewhat terse. I think the current radix tree serves
three different purposes right now:

(1) Manage the view of the connectivity to the outside world in a way
coherent with the administrator's intention or some routing
protocol/daemon.

(2) Provide a mechanism for finding the next-hop for traffic to not
directly attached networks.

(3) Provide a mechanism for finding L2 addresses on directly attached
networks.

Using a single data structure for this has the advantage of code sharing
and can make detailed accounting very easy. It has the problem of
overhead and mixing data of different levels of volatility. I would like
to see the three mechanisms to be separated with appropiate data
structures for each case. The first point would be moved out completely
from the hot path, the actual packet handling case. It would then be no
longer as performance sensitive, so options for storage can be more
focused on size.

For finding the next-hop, the problem is simplified. The number of
next-hop addresses is (normally) limited by the size of the network
neighborhood. Even a backend router at one of the major Internet
exchange points will not have more than a few thousand next-hops,
compared to having 200k routes or more. This can be exploited to reduce
the data size of the BMP lookup data structure and by removing redundant
entries, e.g. a longer prefix with the same next-hop as a shorter
prefix. As I mentioned in my earlier email, the next-hop entry can and
should store a reference to whatever L2 data is needed, so that no
additional search is needed.

For the L3->L2 address mapping, the problem changes from BMP search to
an exact match search. If the mapping is managed correctly, it makes
sense to do this (cheap) search first and skip the whole BMP lookup on
a match as redundant. Hash tables and the like have also nice properties
for read-mostly updates and cache density.

Joerg


Re: Improving use of rt_refcnt

2015-07-09 Thread Mouse
> The thing is that pretty much all the networks that were "normal" in
> 1980 had disappeared by about 1990, leaving only networks that worked
> like DIX ethernet.  You would think the code would have been
> restructured for the new "normal" since then, but I guess old code
> dies hard.

Well, don't forget that there still are a few non-Ethernetty networks
left.  While it would make sense to optimize for Ethernet and its ilk,
the generality still needs to be kept around.

While this is hardly impossible, I daresay it reduces the motivation to
change the current system.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Improving use of rt_refcnt

2015-07-09 Thread Ryota Ozaki
On Thu, Jul 9, 2015 at 1:28 PM, Dennis Ferguson
 wrote:
>
> On 7 Jul, 2015, at 21:25 , Ryota Ozaki  wrote:
>
>> BTW how do you think of separating L2 tables (ARP/NDP) from the L3
>> routing tables? The separation gets rid of cloning/cloned route
>> features and that makes it easy to introduce locks in route.c.
>> (Currently rtrequest1 can be called recursively to remove cloned
>> routes and that makes it hard to use locks.) I read your paper
>> (BSDNetworking.pdf) and it seems to suggest to maintain L2 routes
>> in the common routing table (I may misunderstand your opinion).
>
> I think it is worth stepping back and thinking about what the end
> result of the most common type of access to the route table (a
> forwarding operation, done by a reader who wants to know what to do
> with a packet it has) is going to be, since this is the operation you
> want to optimize.  If the packet is to be sent out an interface then
> the result of the work you are doing is that an L2 header will be
> prepended to the packet and the packet will be queued to an interface
> for transmission.
>
> To make this direct and fast what you want is for the result of the
> route lookup to point directly at the thing that knows what L2 header
> needs to be added and which interface the packet needs to be delivered
> to.  If you have that then all that remains to be done after the
> route lookup is to make space at the front of the packet for the L2
> header, memcpy() it in and give the resulting frame to the interface.
> So you want the route lookup organized to get you from the addresses
> in the packet you are processing to the L2 header and interface you
> need to use to send a packet like that as directly as possible.
>
> While we could talk about how the route lookup might be structured
> to better get directly to the point (this involves splitting the
> rtentry into a "route" part and a "nexthop" part, the latter being
> the result of a lookup and having the data needed to deliver the
> packet with minimal extra work), this probably isn't relevant to
> your question.  What I did want to point out, however, is that
> knowledge of the next hop IP address is (generally) entirely
> unnecessary to forward a packet.  All forwarding operations want
> to know is the L2 header to add to the packet.  Of course ARP or
> ND will have used the next hop IP address to determine the L2 header
> to attach to the packet, but once this is known all packet forwarding
> wants is the result, the L2 header, and doesn't care how that was
> arrived at.  What this means is that your proposed use of the next
> hop IP address is a gratuitous indirection; you would be taking
> something which would be best done as
>
>  -> 
>
> and instead turning this into
>
>  ->  ->  -> 
> 
>
> This will likely always be significantly more expensive than the direct
> alternative.  The indirection is also easy to resolve up front, when a route
> is added, so there's no need to do it over and over again for each forwarded
> packet, and failing to do it when routes are installed moves yet another
> data structure (per-interface) into the forwarding path that will need to
> be dealt with if you eventually want to eliminate the locks.  I think
> you shouldn't do this, or anything else that requires if_output() to
> look at the next hop IP address, since that indirection should go away.
>
> The neat thing about this is that the internal arrangement that makes
> one think that the next hop IP address is an important result of a route
> lookup (it is listed as one in the rtentry structure, and if_output()
> takes it as an argument) is actually a historical artifact.  I think
> this code was written in about 1980.  Then, as now, the point of the
> route lookup was to determine the L2 header to prepend to the packet
> and the interface to queue it to, but what was different was the networks
> that existed then.  Almost all of them did  -> 
> mapping by storing the variable bits of the L2 header directly in the
> local bits of the IP address; see RFC796 and RFC895 for a whole bunch of
> examples (the all-zeros-host-part directed broadcast address that 4.2BSD
> used came from the mapping for experimental ethernet).  This meant that
> the next hop IP address wasn't an indirection at all, it was directly
> the data you needed to construct the L2 header to add to the packet.
> The original exception to this was DIX Ethernet, with its 48 bit MAC
> addresses that were too big to store that way, so the idea of
> implementing an ARP cache in the interface code and using the next hop
> IP address as a less efficient indirection to the L2 header data for
> that type of interface, was invented to make DIX Ethernet look like a
> "normal" interface where the next hop IP address directly and efficiently
> provided the L2 bits you needed to know to send the packet.
>
> The thing is that pretty much all the networks that were "normal"
> in 1980 had disappeared by about 1990, leaving only ne

Re: Improving use of rt_refcnt

2015-07-09 Thread Joerg Sonnenberger
On Wed, Jul 08, 2015 at 09:28:01PM -0700, Dennis Ferguson wrote:
> What this means is that your proposed use of the next
> hop IP address is a gratuitous indirection; you would be taking
> something which would be best done as
> 
>  -> 
> 
> and instead turning this into
> 
>  ->  ->  -> 
> 

This is the part I disagree with. There are generally two cases here:
- the BMP is a local network
- the BMP is not a local network

In the second case, the route can store a direct reference to the L2
address without artifical entries in the table and without additional
lookup. There are some potential issues to consider for dealing with
multiple interfaces sharing IP ranges, but that's a different question.

For the first case, storing cloned routed or doing a hashed target
lookup is very likely to have similar performance and often the latter
option is going to be faster.

Joerg


Re: Improving use of rt_refcnt

2015-07-08 Thread Dennis Ferguson

On 7 Jul, 2015, at 21:25 , Ryota Ozaki  wrote:

> BTW how do you think of separating L2 tables (ARP/NDP) from the L3
> routing tables? The separation gets rid of cloning/cloned route
> features and that makes it easy to introduce locks in route.c.
> (Currently rtrequest1 can be called recursively to remove cloned
> routes and that makes it hard to use locks.) I read your paper
> (BSDNetworking.pdf) and it seems to suggest to maintain L2 routes
> in the common routing table (I may misunderstand your opinion).

I think it is worth stepping back and thinking about what the end
result of the most common type of access to the route table (a
forwarding operation, done by a reader who wants to know what to do
with a packet it has) is going to be, since this is the operation you
want to optimize.  If the packet is to be sent out an interface then
the result of the work you are doing is that an L2 header will be
prepended to the packet and the packet will be queued to an interface
for transmission.

To make this direct and fast what you want is for the result of the
route lookup to point directly at the thing that knows what L2 header
needs to be added and which interface the packet needs to be delivered
to.  If you have that then all that remains to be done after the
route lookup is to make space at the front of the packet for the L2
header, memcpy() it in and give the resulting frame to the interface.
So you want the route lookup organized to get you from the addresses
in the packet you are processing to the L2 header and interface you
need to use to send a packet like that as directly as possible.

While we could talk about how the route lookup might be structured
to better get directly to the point (this involves splitting the
rtentry into a "route" part and a "nexthop" part, the latter being
the result of a lookup and having the data needed to deliver the
packet with minimal extra work), this probably isn't relevant to
your question.  What I did want to point out, however, is that
knowledge of the next hop IP address is (generally) entirely
unnecessary to forward a packet.  All forwarding operations want
to know is the L2 header to add to the packet.  Of course ARP or
ND will have used the next hop IP address to determine the L2 header
to attach to the packet, but once this is known all packet forwarding
wants is the result, the L2 header, and doesn't care how that was
arrived at.  What this means is that your proposed use of the next
hop IP address is a gratuitous indirection; you would be taking
something which would be best done as

 -> 

and instead turning this into

 ->  ->  -> 

This will likely always be significantly more expensive than the direct
alternative.  The indirection is also easy to resolve up front, when a route
is added, so there's no need to do it over and over again for each forwarded
packet, and failing to do it when routes are installed moves yet another
data structure (per-interface) into the forwarding path that will need to
be dealt with if you eventually want to eliminate the locks.  I think
you shouldn't do this, or anything else that requires if_output() to
look at the next hop IP address, since that indirection should go away.

The neat thing about this is that the internal arrangement that makes
one think that the next hop IP address is an important result of a route
lookup (it is listed as one in the rtentry structure, and if_output()
takes it as an argument) is actually a historical artifact.  I think
this code was written in about 1980.  Then, as now, the point of the
route lookup was to determine the L2 header to prepend to the packet
and the interface to queue it to, but what was different was the networks
that existed then.  Almost all of them did  -> 
mapping by storing the variable bits of the L2 header directly in the
local bits of the IP address; see RFC796 and RFC895 for a whole bunch of
examples (the all-zeros-host-part directed broadcast address that 4.2BSD
used came from the mapping for experimental ethernet).  This meant that
the next hop IP address wasn't an indirection at all, it was directly
the data you needed to construct the L2 header to add to the packet.
The original exception to this was DIX Ethernet, with its 48 bit MAC
addresses that were too big to store that way, so the idea of
implementing an ARP cache in the interface code and using the next hop
IP address as a less efficient indirection to the L2 header data for
that type of interface, was invented to make DIX Ethernet look like a
"normal" interface where the next hop IP address directly and efficiently
provided the L2 bits you needed to know to send the packet.

The thing is that pretty much all the networks that were "normal"
in 1980 had disappeared by about 1990, leaving only networks that
worked like DIX ethernet.  You would think the code would have been
restructured for the new "normal" since then, but I guess old code
dies hard.

Dennis Ferguson

Re: Improving use of rt_refcnt

2015-07-07 Thread Ryota Ozaki
On Wed, Jul 8, 2015 at 11:57 AM, Ryota Ozaki  wrote:
> On Mon, Jul 6, 2015 at 2:33 PM, Dennis Ferguson
>  wrote:
>>
>> On 5 Jul, 2015, at 19:02 , Ryota Ozaki  wrote:
>>> On Sun, Jul 5, 2015 at 6:50 PM, Joerg Sonnenberger
>>>  wrote:
 On Sun, Jul 05, 2015 at 02:12:18PM +0900, Ryota Ozaki wrote:
> On Sun, Jul 5, 2015 at 2:35 AM, David Young  wrote:
>> On Sat, Jul 04, 2015 at 09:52:56PM +0900, Ryota Ozaki wrote:
>>> I'm trying to improve use of rt_refcnt: reducing
>>> abuse of it, e.g., rt_refcnt++/rt_refcnt-- outside
>>> route.c and extending it to treat referencing
>>> during packet processing (IOW, references from
>>> local variables) -- currently it handles only
>>> references between routes. The latter is needed for
>>> MP-safe networking.
>>
>> Do you propose to increase/decrease rt_refcnt in the packet processing
>> path, using atomic instructions?
>
> Atomic instructions aren't used yet, i.e., softnet_lock is still needed.
> I will introduce them later. (Using refcount(9) by riastradh would be
> good once it is committed.)

 I think the main point that David wanted to raise is that the normal
 path for packets should *not* do any ref count changes at all.
>>>
>>> Why? rtentry can be freed during the normal path in MP-safe world.
>>> Do you suggest using pserialize instead?
>>
>> I don't think either a reference count or pserialize, or anything else
>> that is non-blocking for readers, can be used to protect the data
>> structure rtentry's are now stored in.
>
> I'm sorry for confusing you, our first attempt doesn't intend to provide
> non-blocking reader operations. But yet I misunderstood about
> the characteristic of the routing table you described below.
>
>>
>> If you want readers to continue while a data structure is being modified
>> then the modification must be implemented so that concurrent readers see
>> the structure in a consistent state (i.e. one that produces either the
>> before-the-change or the after-the-change result) at every point during
>> the change.  Since the current radix tree does not work this way the only
>> way to make a change to it is to block the readers while the change is
>> being made, i.e. with a lock.  An rtentry will hence never be freed while
>> the normal (reader) path is looking at it since you'll be preventing those
>> readers from looking at anything in that structure while you are changing it.
>
> I got what you mean. The current implementation just doesn't free a rtentry
> if there are references to it but does modify a rtentry regardless of refcnt
> of it. I'll use a lock somehow to prevent the latter. Nonetheless, I think
> my patch is still useful to prevent the former. (And anyway we have to reduce
> awkward use of refcnt.)

BTW how do you think of separating L2 tables (ARP/NDP) from the L3
routing tables? The separation gets rid of cloning/cloned route
features and that makes it easy to introduce locks in route.c.
(Currently rtrequest1 can be called recursively to remove cloned
routes and that makes it hard to use locks.) I read your paper
(BSDNetworking.pdf) and it seems to suggest to maintain L2 routes
in the common routing table (I may misunderstand your opinion).

Thanks,
  ozaki-r

>
>>
>> If you don't want it to work this way then you'll need to replace the
>> radix tree with something that permits changes while readers are
>> concurrently operating.
>
> IIUC, your rttree(9) satisfies the requirement, right? We're evaluating
> rttree(9) as a replacement of the current radix tree. Do you have any
> updates on rttree(9)?
>
>   ozaki-r
>
>> To take best advantage of a more modern data
>> structure, however, you are still not going to want readers to ever
>> write the shared data structure if that can be avoided.  The two
>> atomic operations needed to increment and decrement a reference count
>> greatly exceed the cost of a (well-cached) route lookup.
>>
>> Dennis Ferguson


Re: Improving use of rt_refcnt

2015-07-07 Thread Ryota Ozaki
On Mon, Jul 6, 2015 at 2:33 PM, Dennis Ferguson
 wrote:
>
> On 5 Jul, 2015, at 19:02 , Ryota Ozaki  wrote:
>> On Sun, Jul 5, 2015 at 6:50 PM, Joerg Sonnenberger
>>  wrote:
>>> On Sun, Jul 05, 2015 at 02:12:18PM +0900, Ryota Ozaki wrote:
 On Sun, Jul 5, 2015 at 2:35 AM, David Young  wrote:
> On Sat, Jul 04, 2015 at 09:52:56PM +0900, Ryota Ozaki wrote:
>> I'm trying to improve use of rt_refcnt: reducing
>> abuse of it, e.g., rt_refcnt++/rt_refcnt-- outside
>> route.c and extending it to treat referencing
>> during packet processing (IOW, references from
>> local variables) -- currently it handles only
>> references between routes. The latter is needed for
>> MP-safe networking.
>
> Do you propose to increase/decrease rt_refcnt in the packet processing
> path, using atomic instructions?

 Atomic instructions aren't used yet, i.e., softnet_lock is still needed.
 I will introduce them later. (Using refcount(9) by riastradh would be
 good once it is committed.)
>>>
>>> I think the main point that David wanted to raise is that the normal
>>> path for packets should *not* do any ref count changes at all.
>>
>> Why? rtentry can be freed during the normal path in MP-safe world.
>> Do you suggest using pserialize instead?
>
> I don't think either a reference count or pserialize, or anything else
> that is non-blocking for readers, can be used to protect the data
> structure rtentry's are now stored in.

I'm sorry for confusing you, our first attempt doesn't intend to provide
non-blocking reader operations. But yet I misunderstood about
the characteristic of the routing table you described below.

>
> If you want readers to continue while a data structure is being modified
> then the modification must be implemented so that concurrent readers see
> the structure in a consistent state (i.e. one that produces either the
> before-the-change or the after-the-change result) at every point during
> the change.  Since the current radix tree does not work this way the only
> way to make a change to it is to block the readers while the change is
> being made, i.e. with a lock.  An rtentry will hence never be freed while
> the normal (reader) path is looking at it since you'll be preventing those
> readers from looking at anything in that structure while you are changing it.

I got what you mean. The current implementation just doesn't free a rtentry
if there are references to it but does modify a rtentry regardless of refcnt
of it. I'll use a lock somehow to prevent the latter. Nonetheless, I think
my patch is still useful to prevent the former. (And anyway we have to reduce
awkward use of refcnt.)

>
> If you don't want it to work this way then you'll need to replace the
> radix tree with something that permits changes while readers are
> concurrently operating.

IIUC, your rttree(9) satisfies the requirement, right? We're evaluating
rttree(9) as a replacement of the current radix tree. Do you have any
updates on rttree(9)?

  ozaki-r

> To take best advantage of a more modern data
> structure, however, you are still not going to want readers to ever
> write the shared data structure if that can be avoided.  The two
> atomic operations needed to increment and decrement a reference count
> greatly exceed the cost of a (well-cached) route lookup.
>
> Dennis Ferguson


Re: Improving use of rt_refcnt

2015-07-06 Thread David Young
On Sun, Jul 05, 2015 at 11:50:12AM +0200, Joerg Sonnenberger wrote:
> I think the main point that David wanted to raise is that the normal
> path for packets should *not* do any ref count changes at all.

I wasn't trying to make a point.  I wanted to make sure that I properly
understood Ryota's plans.

Dave

-- 
David Young
dyo...@pobox.comUrbana, IL(217) 721-9981


Re: Improving use of rt_refcnt

2015-07-05 Thread Dennis Ferguson

On 5 Jul, 2015, at 19:02 , Ryota Ozaki  wrote:
> On Sun, Jul 5, 2015 at 6:50 PM, Joerg Sonnenberger
>  wrote:
>> On Sun, Jul 05, 2015 at 02:12:18PM +0900, Ryota Ozaki wrote:
>>> On Sun, Jul 5, 2015 at 2:35 AM, David Young  wrote:
 On Sat, Jul 04, 2015 at 09:52:56PM +0900, Ryota Ozaki wrote:
> I'm trying to improve use of rt_refcnt: reducing
> abuse of it, e.g., rt_refcnt++/rt_refcnt-- outside
> route.c and extending it to treat referencing
> during packet processing (IOW, references from
> local variables) -- currently it handles only
> references between routes. The latter is needed for
> MP-safe networking.
 
 Do you propose to increase/decrease rt_refcnt in the packet processing
 path, using atomic instructions?
>>> 
>>> Atomic instructions aren't used yet, i.e., softnet_lock is still needed.
>>> I will introduce them later. (Using refcount(9) by riastradh would be
>>> good once it is committed.)
>> 
>> I think the main point that David wanted to raise is that the normal
>> path for packets should *not* do any ref count changes at all.
> 
> Why? rtentry can be freed during the normal path in MP-safe world.
> Do you suggest using pserialize instead?

I don't think either a reference count or pserialize, or anything else
that is non-blocking for readers, can be used to protect the data
structure rtentry's are now stored in.

If you want readers to continue while a data structure is being modified
then the modification must be implemented so that concurrent readers see
the structure in a consistent state (i.e. one that produces either the
before-the-change or the after-the-change result) at every point during
the change.  Since the current radix tree does not work this way the only
way to make a change to it is to block the readers while the change is
being made, i.e. with a lock.  An rtentry will hence never be freed while
the normal (reader) path is looking at it since you'll be preventing those
readers from looking at anything in that structure while you are changing it.

If you don't want it to work this way then you'll need to replace the
radix tree with something that permits changes while readers are
concurrently operating.  To take best advantage of a more modern data
structure, however, you are still not going to want readers to ever
write the shared data structure if that can be avoided.  The two
atomic operations needed to increment and decrement a reference count
greatly exceed the cost of a (well-cached) route lookup.

Dennis Ferguson


Re: Improving use of rt_refcnt

2015-07-05 Thread Ryota Ozaki
On Sun, Jul 5, 2015 at 6:50 PM, Joerg Sonnenberger
 wrote:
> On Sun, Jul 05, 2015 at 02:12:18PM +0900, Ryota Ozaki wrote:
>> On Sun, Jul 5, 2015 at 2:35 AM, David Young  wrote:
>> > On Sat, Jul 04, 2015 at 09:52:56PM +0900, Ryota Ozaki wrote:
>> >> I'm trying to improve use of rt_refcnt: reducing
>> >> abuse of it, e.g., rt_refcnt++/rt_refcnt-- outside
>> >> route.c and extending it to treat referencing
>> >> during packet processing (IOW, references from
>> >> local variables) -- currently it handles only
>> >> references between routes. The latter is needed for
>> >> MP-safe networking.
>> >
>> > Do you propose to increase/decrease rt_refcnt in the packet processing
>> > path, using atomic instructions?
>>
>> Atomic instructions aren't used yet, i.e., softnet_lock is still needed.
>> I will introduce them later. (Using refcount(9) by riastradh would be
>> good once it is committed.)
>
> I think the main point that David wanted to raise is that the normal
> path for packets should *not* do any ref count changes at all.

Why? rtentry can be freed during the normal path in MP-safe world.
Do you suggest using pserialize instead?

  ozaki-r

>
> Joerg


Re: Improving use of rt_refcnt

2015-07-05 Thread Joerg Sonnenberger
On Sun, Jul 05, 2015 at 02:12:18PM +0900, Ryota Ozaki wrote:
> On Sun, Jul 5, 2015 at 2:35 AM, David Young  wrote:
> > On Sat, Jul 04, 2015 at 09:52:56PM +0900, Ryota Ozaki wrote:
> >> I'm trying to improve use of rt_refcnt: reducing
> >> abuse of it, e.g., rt_refcnt++/rt_refcnt-- outside
> >> route.c and extending it to treat referencing
> >> during packet processing (IOW, references from
> >> local variables) -- currently it handles only
> >> references between routes. The latter is needed for
> >> MP-safe networking.
> >
> > Do you propose to increase/decrease rt_refcnt in the packet processing
> > path, using atomic instructions?
> 
> Atomic instructions aren't used yet, i.e., softnet_lock is still needed.
> I will introduce them later. (Using refcount(9) by riastradh would be
> good once it is committed.)

I think the main point that David wanted to raise is that the normal
path for packets should *not* do any ref count changes at all.

Joerg


Re: Improving use of rt_refcnt

2015-07-04 Thread Ryota Ozaki
On Sun, Jul 5, 2015 at 2:35 AM, David Young  wrote:
> On Sat, Jul 04, 2015 at 09:52:56PM +0900, Ryota Ozaki wrote:
>> I'm trying to improve use of rt_refcnt: reducing
>> abuse of it, e.g., rt_refcnt++/rt_refcnt-- outside
>> route.c and extending it to treat referencing
>> during packet processing (IOW, references from
>> local variables) -- currently it handles only
>> references between routes. The latter is needed for
>> MP-safe networking.
>
> Do you propose to increase/decrease rt_refcnt in the packet processing
> path, using atomic instructions?

Atomic instructions aren't used yet, i.e., softnet_lock is still needed.
I will introduce them later. (Using refcount(9) by riastradh would be
good once it is committed.)

  ozaki-r

>
> Dave
>
> --
> David Young
> dyo...@pobox.comUrbana, IL(217) 721-9981


Re: Improving use of rt_refcnt

2015-07-04 Thread David Young
On Sat, Jul 04, 2015 at 09:52:56PM +0900, Ryota Ozaki wrote:
> I'm trying to improve use of rt_refcnt: reducing
> abuse of it, e.g., rt_refcnt++/rt_refcnt-- outside
> route.c and extending it to treat referencing
> during packet processing (IOW, references from
> local variables) -- currently it handles only
> references between routes. The latter is needed for
> MP-safe networking.

Do you propose to increase/decrease rt_refcnt in the packet processing
path, using atomic instructions?

Dave

-- 
David Young
dyo...@pobox.comUrbana, IL(217) 721-9981