Re: Improving use of rt_refcnt

2015-07-15 Thread Ryota Ozaki
On Fri, Jul 10, 2015 at 6:10 AM, Joerg Sonnenberger
jo...@britannica.bec.de wrote:
 On Sun, Jul 05, 2015 at 10:33:02PM -0700, Dennis Ferguson wrote:
 If you don't want it to work this way then you'll need to replace the
 radix tree with something that permits changes while readers are
 concurrently operating.  To take best advantage of a more modern data
 structure, however, you are still not going to want readers to ever
 write the shared data structure if that can be avoided.  The two
 atomic operations needed to increment and decrement a reference count
 greatly exceed the cost of a (well-cached) route lookup.

 Let me pick the discussion up at this point since David mentioned that
 my last reply was somewhat terse. I think the current radix tree serves
 three different purposes right now:

 (1) Manage the view of the connectivity to the outside world in a way
 coherent with the administrator's intention or some routing
 protocol/daemon.

 (2) Provide a mechanism for finding the next-hop for traffic to not
 directly attached networks.

 (3) Provide a mechanism for finding L2 addresses on directly attached
 networks.

 Using a single data structure for this has the advantage of code sharing
 and can make detailed accounting very easy. It has the problem of
 overhead and mixing data of different levels of volatility. I would like
 to see the three mechanisms to be separated with appropiate data
 structures for each case. The first point would be moved out completely
 from the hot path, the actual packet handling case. It would then be no
 longer as performance sensitive, so options for storage can be more
 focused on size.

 For finding the next-hop, the problem is simplified. The number of
 next-hop addresses is (normally) limited by the size of the network
 neighborhood. Even a backend router at one of the major Internet
 exchange points will not have more than a few thousand next-hops,
 compared to having 200k routes or more. This can be exploited to reduce
 the data size of the BMP lookup data structure and by removing redundant
 entries, e.g. a longer prefix with the same next-hop as a shorter
 prefix. As I mentioned in my earlier email, the next-hop entry can and
 should store a reference to whatever L2 data is needed, so that no
 additional search is needed.

 For the L3-L2 address mapping, the problem changes from BMP search to
 an exact match search. If the mapping is managed correctly, it makes
 sense to do this (cheap) search first and skip the whole BMP lookup on
 a match as redundant. Hash tables and the like have also nice properties
 for read-mostly updates and cache density.

 Joerg

http://www.netbsd.org/~ozaki-r/separate-l2-nexthop.diff

I've written a POC patch toward the nexthop cache separation.
It is mostly based on FreeBSD implementation.

It's still not mature; there remain debug codes and many places
are probably broken. But it works anyway, at least it passes
most of ATF tests.

This is a demonstration to show what we need to do if we go
the direction. I don't intend to commit it soon.

  ozaki-r


Re: Improving use of rt_refcnt

2015-07-15 Thread Ryota Ozaki
On Sat, Jul 4, 2015 at 9:52 PM, Ryota Ozaki ozak...@netbsd.org wrote:
 Hi,

 I'm trying to improve use of rt_refcnt: reducing
 abuse of it, e.g., rt_refcnt++/rt_refcnt-- outside
 route.c and extending it to treat referencing
 during packet processing (IOW, references from
 local variables) -- currently it handles only
 references between routes. The latter is needed for
 MP-safe networking.

 Here is a patch:
 http://www.netbsd.org/~ozaki-r/reduce-rt_refcnt-abuse.diff

BTW are there objections against the patch as a cleanup
of use of rt_refcnt?

  ozaki-r


 The patch passes all ATF tests and an additional test
 specific to refcnt that is not committed yet due
 to lack of refcnt outputs in netstat -r (See PR#50027).

 Note that the patch isn't ready for MP-safe yet;
 we need atomicity of refcnt operations somehow
 (locking is an easy solution for it). This work
 will be solved in another patch.

 Any comments or suggestions are welcome.

 Thanks,
   ozaki-r


Re: Improving use of rt_refcnt

2015-07-09 Thread Joerg Sonnenberger
On Wed, Jul 08, 2015 at 09:28:01PM -0700, Dennis Ferguson wrote:
 What this means is that your proposed use of the next
 hop IP address is a gratuitous indirection; you would be taking
 something which would be best done as
 
 route lookup - L2 header
 
 and instead turning this into
 
 route lookup - next hop IP address - next hop address lookup - 
 L2 header

This is the part I disagree with. There are generally two cases here:
- the BMP is a local network
- the BMP is not a local network

In the second case, the route can store a direct reference to the L2
address without artifical entries in the table and without additional
lookup. There are some potential issues to consider for dealing with
multiple interfaces sharing IP ranges, but that's a different question.

For the first case, storing cloned routed or doing a hashed target
lookup is very likely to have similar performance and often the latter
option is going to be faster.

Joerg


Re: Improving use of rt_refcnt

2015-07-09 Thread Ryota Ozaki
On Thu, Jul 9, 2015 at 1:28 PM, Dennis Ferguson
dennis.c.fergu...@gmail.com wrote:

 On 7 Jul, 2015, at 21:25 , Ryota Ozaki ozak...@netbsd.org wrote:

 BTW how do you think of separating L2 tables (ARP/NDP) from the L3
 routing tables? The separation gets rid of cloning/cloned route
 features and that makes it easy to introduce locks in route.c.
 (Currently rtrequest1 can be called recursively to remove cloned
 routes and that makes it hard to use locks.) I read your paper
 (BSDNetworking.pdf) and it seems to suggest to maintain L2 routes
 in the common routing table (I may misunderstand your opinion).

 I think it is worth stepping back and thinking about what the end
 result of the most common type of access to the route table (a
 forwarding operation, done by a reader who wants to know what to do
 with a packet it has) is going to be, since this is the operation you
 want to optimize.  If the packet is to be sent out an interface then
 the result of the work you are doing is that an L2 header will be
 prepended to the packet and the packet will be queued to an interface
 for transmission.

 To make this direct and fast what you want is for the result of the
 route lookup to point directly at the thing that knows what L2 header
 needs to be added and which interface the packet needs to be delivered
 to.  If you have that then all that remains to be done after the
 route lookup is to make space at the front of the packet for the L2
 header, memcpy() it in and give the resulting frame to the interface.
 So you want the route lookup organized to get you from the addresses
 in the packet you are processing to the L2 header and interface you
 need to use to send a packet like that as directly as possible.

 While we could talk about how the route lookup might be structured
 to better get directly to the point (this involves splitting the
 rtentry into a route part and a nexthop part, the latter being
 the result of a lookup and having the data needed to deliver the
 packet with minimal extra work), this probably isn't relevant to
 your question.  What I did want to point out, however, is that
 knowledge of the next hop IP address is (generally) entirely
 unnecessary to forward a packet.  All forwarding operations want
 to know is the L2 header to add to the packet.  Of course ARP or
 ND will have used the next hop IP address to determine the L2 header
 to attach to the packet, but once this is known all packet forwarding
 wants is the result, the L2 header, and doesn't care how that was
 arrived at.  What this means is that your proposed use of the next
 hop IP address is a gratuitous indirection; you would be taking
 something which would be best done as

 route lookup - L2 header

 and instead turning this into

 route lookup - next hop IP address - next hop address lookup - 
 L2 header

 This will likely always be significantly more expensive than the direct
 alternative.  The indirection is also easy to resolve up front, when a route
 is added, so there's no need to do it over and over again for each forwarded
 packet, and failing to do it when routes are installed moves yet another
 data structure (per-interface) into the forwarding path that will need to
 be dealt with if you eventually want to eliminate the locks.  I think
 you shouldn't do this, or anything else that requires if_output() to
 look at the next hop IP address, since that indirection should go away.

 The neat thing about this is that the internal arrangement that makes
 one think that the next hop IP address is an important result of a route
 lookup (it is listed as one in the rtentry structure, and if_output()
 takes it as an argument) is actually a historical artifact.  I think
 this code was written in about 1980.  Then, as now, the point of the
 route lookup was to determine the L2 header to prepend to the packet
 and the interface to queue it to, but what was different was the networks
 that existed then.  Almost all of them did IP address - L2 header
 mapping by storing the variable bits of the L2 header directly in the
 local bits of the IP address; see RFC796 and RFC895 for a whole bunch of
 examples (the all-zeros-host-part directed broadcast address that 4.2BSD
 used came from the mapping for experimental ethernet).  This meant that
 the next hop IP address wasn't an indirection at all, it was directly
 the data you needed to construct the L2 header to add to the packet.
 The original exception to this was DIX Ethernet, with its 48 bit MAC
 addresses that were too big to store that way, so the idea of
 implementing an ARP cache in the interface code and using the next hop
 IP address as a less efficient indirection to the L2 header data for
 that type of interface, was invented to make DIX Ethernet look like a
 normal interface where the next hop IP address directly and efficiently
 provided the L2 bits you needed to know to send the packet.

 The thing is that pretty much all the networks that were normal
 in 1980 had 

Re: Improving use of rt_refcnt

2015-07-09 Thread Mouse
 The thing is that pretty much all the networks that were normal in
 1980 had disappeared by about 1990, leaving only networks that worked
 like DIX ethernet.  You would think the code would have been
 restructured for the new normal since then, but I guess old code
 dies hard.

Well, don't forget that there still are a few non-Ethernetty networks
left.  While it would make sense to optimize for Ethernet and its ilk,
the generality still needs to be kept around.

While this is hardly impossible, I daresay it reduces the motivation to
change the current system.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Improving use of rt_refcnt

2015-07-09 Thread Joerg Sonnenberger
On Sun, Jul 05, 2015 at 10:33:02PM -0700, Dennis Ferguson wrote:
 If you don't want it to work this way then you'll need to replace the
 radix tree with something that permits changes while readers are
 concurrently operating.  To take best advantage of a more modern data
 structure, however, you are still not going to want readers to ever
 write the shared data structure if that can be avoided.  The two
 atomic operations needed to increment and decrement a reference count
 greatly exceed the cost of a (well-cached) route lookup.

Let me pick the discussion up at this point since David mentioned that
my last reply was somewhat terse. I think the current radix tree serves
three different purposes right now:

(1) Manage the view of the connectivity to the outside world in a way
coherent with the administrator's intention or some routing
protocol/daemon.

(2) Provide a mechanism for finding the next-hop for traffic to not
directly attached networks.

(3) Provide a mechanism for finding L2 addresses on directly attached
networks.

Using a single data structure for this has the advantage of code sharing
and can make detailed accounting very easy. It has the problem of
overhead and mixing data of different levels of volatility. I would like
to see the three mechanisms to be separated with appropiate data
structures for each case. The first point would be moved out completely
from the hot path, the actual packet handling case. It would then be no
longer as performance sensitive, so options for storage can be more
focused on size.

For finding the next-hop, the problem is simplified. The number of
next-hop addresses is (normally) limited by the size of the network
neighborhood. Even a backend router at one of the major Internet
exchange points will not have more than a few thousand next-hops,
compared to having 200k routes or more. This can be exploited to reduce
the data size of the BMP lookup data structure and by removing redundant
entries, e.g. a longer prefix with the same next-hop as a shorter
prefix. As I mentioned in my earlier email, the next-hop entry can and
should store a reference to whatever L2 data is needed, so that no
additional search is needed.

For the L3-L2 address mapping, the problem changes from BMP search to
an exact match search. If the mapping is managed correctly, it makes
sense to do this (cheap) search first and skip the whole BMP lookup on
a match as redundant. Hash tables and the like have also nice properties
for read-mostly updates and cache density.

Joerg


Re: Improving use of rt_refcnt

2015-07-08 Thread Dennis Ferguson

On 7 Jul, 2015, at 21:25 , Ryota Ozaki ozak...@netbsd.org wrote:

 BTW how do you think of separating L2 tables (ARP/NDP) from the L3
 routing tables? The separation gets rid of cloning/cloned route
 features and that makes it easy to introduce locks in route.c.
 (Currently rtrequest1 can be called recursively to remove cloned
 routes and that makes it hard to use locks.) I read your paper
 (BSDNetworking.pdf) and it seems to suggest to maintain L2 routes
 in the common routing table (I may misunderstand your opinion).

I think it is worth stepping back and thinking about what the end
result of the most common type of access to the route table (a
forwarding operation, done by a reader who wants to know what to do
with a packet it has) is going to be, since this is the operation you
want to optimize.  If the packet is to be sent out an interface then
the result of the work you are doing is that an L2 header will be
prepended to the packet and the packet will be queued to an interface
for transmission.

To make this direct and fast what you want is for the result of the
route lookup to point directly at the thing that knows what L2 header
needs to be added and which interface the packet needs to be delivered
to.  If you have that then all that remains to be done after the
route lookup is to make space at the front of the packet for the L2
header, memcpy() it in and give the resulting frame to the interface.
So you want the route lookup organized to get you from the addresses
in the packet you are processing to the L2 header and interface you
need to use to send a packet like that as directly as possible.

While we could talk about how the route lookup might be structured
to better get directly to the point (this involves splitting the
rtentry into a route part and a nexthop part, the latter being
the result of a lookup and having the data needed to deliver the
packet with minimal extra work), this probably isn't relevant to
your question.  What I did want to point out, however, is that
knowledge of the next hop IP address is (generally) entirely
unnecessary to forward a packet.  All forwarding operations want
to know is the L2 header to add to the packet.  Of course ARP or
ND will have used the next hop IP address to determine the L2 header
to attach to the packet, but once this is known all packet forwarding
wants is the result, the L2 header, and doesn't care how that was
arrived at.  What this means is that your proposed use of the next
hop IP address is a gratuitous indirection; you would be taking
something which would be best done as

route lookup - L2 header

and instead turning this into

route lookup - next hop IP address - next hop address lookup - L2 
header

This will likely always be significantly more expensive than the direct
alternative.  The indirection is also easy to resolve up front, when a route
is added, so there's no need to do it over and over again for each forwarded
packet, and failing to do it when routes are installed moves yet another
data structure (per-interface) into the forwarding path that will need to
be dealt with if you eventually want to eliminate the locks.  I think
you shouldn't do this, or anything else that requires if_output() to
look at the next hop IP address, since that indirection should go away.

The neat thing about this is that the internal arrangement that makes
one think that the next hop IP address is an important result of a route
lookup (it is listed as one in the rtentry structure, and if_output()
takes it as an argument) is actually a historical artifact.  I think
this code was written in about 1980.  Then, as now, the point of the
route lookup was to determine the L2 header to prepend to the packet
and the interface to queue it to, but what was different was the networks
that existed then.  Almost all of them did IP address - L2 header
mapping by storing the variable bits of the L2 header directly in the
local bits of the IP address; see RFC796 and RFC895 for a whole bunch of
examples (the all-zeros-host-part directed broadcast address that 4.2BSD
used came from the mapping for experimental ethernet).  This meant that
the next hop IP address wasn't an indirection at all, it was directly
the data you needed to construct the L2 header to add to the packet.
The original exception to this was DIX Ethernet, with its 48 bit MAC
addresses that were too big to store that way, so the idea of
implementing an ARP cache in the interface code and using the next hop
IP address as a less efficient indirection to the L2 header data for
that type of interface, was invented to make DIX Ethernet look like a
normal interface where the next hop IP address directly and efficiently
provided the L2 bits you needed to know to send the packet.

The thing is that pretty much all the networks that were normal
in 1980 had disappeared by about 1990, leaving only networks that
worked like DIX ethernet.  You would think the code would have been
restructured for the new 

Re: Improving use of rt_refcnt

2015-07-07 Thread Ryota Ozaki
On Mon, Jul 6, 2015 at 2:33 PM, Dennis Ferguson
dennis.c.fergu...@gmail.com wrote:

 On 5 Jul, 2015, at 19:02 , Ryota Ozaki ozak...@netbsd.org wrote:
 On Sun, Jul 5, 2015 at 6:50 PM, Joerg Sonnenberger
 jo...@britannica.bec.de wrote:
 On Sun, Jul 05, 2015 at 02:12:18PM +0900, Ryota Ozaki wrote:
 On Sun, Jul 5, 2015 at 2:35 AM, David Young dyo...@pobox.com wrote:
 On Sat, Jul 04, 2015 at 09:52:56PM +0900, Ryota Ozaki wrote:
 I'm trying to improve use of rt_refcnt: reducing
 abuse of it, e.g., rt_refcnt++/rt_refcnt-- outside
 route.c and extending it to treat referencing
 during packet processing (IOW, references from
 local variables) -- currently it handles only
 references between routes. The latter is needed for
 MP-safe networking.

 Do you propose to increase/decrease rt_refcnt in the packet processing
 path, using atomic instructions?

 Atomic instructions aren't used yet, i.e., softnet_lock is still needed.
 I will introduce them later. (Using refcount(9) by riastradh would be
 good once it is committed.)

 I think the main point that David wanted to raise is that the normal
 path for packets should *not* do any ref count changes at all.

 Why? rtentry can be freed during the normal path in MP-safe world.
 Do you suggest using pserialize instead?

 I don't think either a reference count or pserialize, or anything else
 that is non-blocking for readers, can be used to protect the data
 structure rtentry's are now stored in.

I'm sorry for confusing you, our first attempt doesn't intend to provide
non-blocking reader operations. But yet I misunderstood about
the characteristic of the routing table you described below.


 If you want readers to continue while a data structure is being modified
 then the modification must be implemented so that concurrent readers see
 the structure in a consistent state (i.e. one that produces either the
 before-the-change or the after-the-change result) at every point during
 the change.  Since the current radix tree does not work this way the only
 way to make a change to it is to block the readers while the change is
 being made, i.e. with a lock.  An rtentry will hence never be freed while
 the normal (reader) path is looking at it since you'll be preventing those
 readers from looking at anything in that structure while you are changing it.

I got what you mean. The current implementation just doesn't free a rtentry
if there are references to it but does modify a rtentry regardless of refcnt
of it. I'll use a lock somehow to prevent the latter. Nonetheless, I think
my patch is still useful to prevent the former. (And anyway we have to reduce
awkward use of refcnt.)


 If you don't want it to work this way then you'll need to replace the
 radix tree with something that permits changes while readers are
 concurrently operating.

IIUC, your rttree(9) satisfies the requirement, right? We're evaluating
rttree(9) as a replacement of the current radix tree. Do you have any
updates on rttree(9)?

  ozaki-r

 To take best advantage of a more modern data
 structure, however, you are still not going to want readers to ever
 write the shared data structure if that can be avoided.  The two
 atomic operations needed to increment and decrement a reference count
 greatly exceed the cost of a (well-cached) route lookup.

 Dennis Ferguson


Re: Improving use of rt_refcnt

2015-07-07 Thread Ryota Ozaki
On Wed, Jul 8, 2015 at 11:57 AM, Ryota Ozaki ozak...@netbsd.org wrote:
 On Mon, Jul 6, 2015 at 2:33 PM, Dennis Ferguson
 dennis.c.fergu...@gmail.com wrote:

 On 5 Jul, 2015, at 19:02 , Ryota Ozaki ozak...@netbsd.org wrote:
 On Sun, Jul 5, 2015 at 6:50 PM, Joerg Sonnenberger
 jo...@britannica.bec.de wrote:
 On Sun, Jul 05, 2015 at 02:12:18PM +0900, Ryota Ozaki wrote:
 On Sun, Jul 5, 2015 at 2:35 AM, David Young dyo...@pobox.com wrote:
 On Sat, Jul 04, 2015 at 09:52:56PM +0900, Ryota Ozaki wrote:
 I'm trying to improve use of rt_refcnt: reducing
 abuse of it, e.g., rt_refcnt++/rt_refcnt-- outside
 route.c and extending it to treat referencing
 during packet processing (IOW, references from
 local variables) -- currently it handles only
 references between routes. The latter is needed for
 MP-safe networking.

 Do you propose to increase/decrease rt_refcnt in the packet processing
 path, using atomic instructions?

 Atomic instructions aren't used yet, i.e., softnet_lock is still needed.
 I will introduce them later. (Using refcount(9) by riastradh would be
 good once it is committed.)

 I think the main point that David wanted to raise is that the normal
 path for packets should *not* do any ref count changes at all.

 Why? rtentry can be freed during the normal path in MP-safe world.
 Do you suggest using pserialize instead?

 I don't think either a reference count or pserialize, or anything else
 that is non-blocking for readers, can be used to protect the data
 structure rtentry's are now stored in.

 I'm sorry for confusing you, our first attempt doesn't intend to provide
 non-blocking reader operations. But yet I misunderstood about
 the characteristic of the routing table you described below.


 If you want readers to continue while a data structure is being modified
 then the modification must be implemented so that concurrent readers see
 the structure in a consistent state (i.e. one that produces either the
 before-the-change or the after-the-change result) at every point during
 the change.  Since the current radix tree does not work this way the only
 way to make a change to it is to block the readers while the change is
 being made, i.e. with a lock.  An rtentry will hence never be freed while
 the normal (reader) path is looking at it since you'll be preventing those
 readers from looking at anything in that structure while you are changing it.

 I got what you mean. The current implementation just doesn't free a rtentry
 if there are references to it but does modify a rtentry regardless of refcnt
 of it. I'll use a lock somehow to prevent the latter. Nonetheless, I think
 my patch is still useful to prevent the former. (And anyway we have to reduce
 awkward use of refcnt.)

BTW how do you think of separating L2 tables (ARP/NDP) from the L3
routing tables? The separation gets rid of cloning/cloned route
features and that makes it easy to introduce locks in route.c.
(Currently rtrequest1 can be called recursively to remove cloned
routes and that makes it hard to use locks.) I read your paper
(BSDNetworking.pdf) and it seems to suggest to maintain L2 routes
in the common routing table (I may misunderstand your opinion).

Thanks,
  ozaki-r



 If you don't want it to work this way then you'll need to replace the
 radix tree with something that permits changes while readers are
 concurrently operating.

 IIUC, your rttree(9) satisfies the requirement, right? We're evaluating
 rttree(9) as a replacement of the current radix tree. Do you have any
 updates on rttree(9)?

   ozaki-r

 To take best advantage of a more modern data
 structure, however, you are still not going to want readers to ever
 write the shared data structure if that can be avoided.  The two
 atomic operations needed to increment and decrement a reference count
 greatly exceed the cost of a (well-cached) route lookup.

 Dennis Ferguson


Re: Improving use of rt_refcnt

2015-07-06 Thread David Young
On Sun, Jul 05, 2015 at 11:50:12AM +0200, Joerg Sonnenberger wrote:
 I think the main point that David wanted to raise is that the normal
 path for packets should *not* do any ref count changes at all.

I wasn't trying to make a point.  I wanted to make sure that I properly
understood Ryota's plans.

Dave

-- 
David Young
dyo...@pobox.comUrbana, IL(217) 721-9981


Re: Improving use of rt_refcnt

2015-07-05 Thread Joerg Sonnenberger
On Sun, Jul 05, 2015 at 02:12:18PM +0900, Ryota Ozaki wrote:
 On Sun, Jul 5, 2015 at 2:35 AM, David Young dyo...@pobox.com wrote:
  On Sat, Jul 04, 2015 at 09:52:56PM +0900, Ryota Ozaki wrote:
  I'm trying to improve use of rt_refcnt: reducing
  abuse of it, e.g., rt_refcnt++/rt_refcnt-- outside
  route.c and extending it to treat referencing
  during packet processing (IOW, references from
  local variables) -- currently it handles only
  references between routes. The latter is needed for
  MP-safe networking.
 
  Do you propose to increase/decrease rt_refcnt in the packet processing
  path, using atomic instructions?
 
 Atomic instructions aren't used yet, i.e., softnet_lock is still needed.
 I will introduce them later. (Using refcount(9) by riastradh would be
 good once it is committed.)

I think the main point that David wanted to raise is that the normal
path for packets should *not* do any ref count changes at all.

Joerg


Re: Improving use of rt_refcnt

2015-07-05 Thread Dennis Ferguson

On 5 Jul, 2015, at 19:02 , Ryota Ozaki ozak...@netbsd.org wrote:
 On Sun, Jul 5, 2015 at 6:50 PM, Joerg Sonnenberger
 jo...@britannica.bec.de wrote:
 On Sun, Jul 05, 2015 at 02:12:18PM +0900, Ryota Ozaki wrote:
 On Sun, Jul 5, 2015 at 2:35 AM, David Young dyo...@pobox.com wrote:
 On Sat, Jul 04, 2015 at 09:52:56PM +0900, Ryota Ozaki wrote:
 I'm trying to improve use of rt_refcnt: reducing
 abuse of it, e.g., rt_refcnt++/rt_refcnt-- outside
 route.c and extending it to treat referencing
 during packet processing (IOW, references from
 local variables) -- currently it handles only
 references between routes. The latter is needed for
 MP-safe networking.
 
 Do you propose to increase/decrease rt_refcnt in the packet processing
 path, using atomic instructions?
 
 Atomic instructions aren't used yet, i.e., softnet_lock is still needed.
 I will introduce them later. (Using refcount(9) by riastradh would be
 good once it is committed.)
 
 I think the main point that David wanted to raise is that the normal
 path for packets should *not* do any ref count changes at all.
 
 Why? rtentry can be freed during the normal path in MP-safe world.
 Do you suggest using pserialize instead?

I don't think either a reference count or pserialize, or anything else
that is non-blocking for readers, can be used to protect the data
structure rtentry's are now stored in.

If you want readers to continue while a data structure is being modified
then the modification must be implemented so that concurrent readers see
the structure in a consistent state (i.e. one that produces either the
before-the-change or the after-the-change result) at every point during
the change.  Since the current radix tree does not work this way the only
way to make a change to it is to block the readers while the change is
being made, i.e. with a lock.  An rtentry will hence never be freed while
the normal (reader) path is looking at it since you'll be preventing those
readers from looking at anything in that structure while you are changing it.

If you don't want it to work this way then you'll need to replace the
radix tree with something that permits changes while readers are
concurrently operating.  To take best advantage of a more modern data
structure, however, you are still not going to want readers to ever
write the shared data structure if that can be avoided.  The two
atomic operations needed to increment and decrement a reference count
greatly exceed the cost of a (well-cached) route lookup.

Dennis Ferguson


Re: Improving use of rt_refcnt

2015-07-05 Thread Ryota Ozaki
On Sun, Jul 5, 2015 at 6:50 PM, Joerg Sonnenberger
jo...@britannica.bec.de wrote:
 On Sun, Jul 05, 2015 at 02:12:18PM +0900, Ryota Ozaki wrote:
 On Sun, Jul 5, 2015 at 2:35 AM, David Young dyo...@pobox.com wrote:
  On Sat, Jul 04, 2015 at 09:52:56PM +0900, Ryota Ozaki wrote:
  I'm trying to improve use of rt_refcnt: reducing
  abuse of it, e.g., rt_refcnt++/rt_refcnt-- outside
  route.c and extending it to treat referencing
  during packet processing (IOW, references from
  local variables) -- currently it handles only
  references between routes. The latter is needed for
  MP-safe networking.
 
  Do you propose to increase/decrease rt_refcnt in the packet processing
  path, using atomic instructions?

 Atomic instructions aren't used yet, i.e., softnet_lock is still needed.
 I will introduce them later. (Using refcount(9) by riastradh would be
 good once it is committed.)

 I think the main point that David wanted to raise is that the normal
 path for packets should *not* do any ref count changes at all.

Why? rtentry can be freed during the normal path in MP-safe world.
Do you suggest using pserialize instead?

  ozaki-r


 Joerg


Re: Improving use of rt_refcnt

2015-07-04 Thread David Young
On Sat, Jul 04, 2015 at 09:52:56PM +0900, Ryota Ozaki wrote:
 I'm trying to improve use of rt_refcnt: reducing
 abuse of it, e.g., rt_refcnt++/rt_refcnt-- outside
 route.c and extending it to treat referencing
 during packet processing (IOW, references from
 local variables) -- currently it handles only
 references between routes. The latter is needed for
 MP-safe networking.

Do you propose to increase/decrease rt_refcnt in the packet processing
path, using atomic instructions?

Dave

-- 
David Young
dyo...@pobox.comUrbana, IL(217) 721-9981


Re: Improving use of rt_refcnt

2015-07-04 Thread Ryota Ozaki
On Sun, Jul 5, 2015 at 2:35 AM, David Young dyo...@pobox.com wrote:
 On Sat, Jul 04, 2015 at 09:52:56PM +0900, Ryota Ozaki wrote:
 I'm trying to improve use of rt_refcnt: reducing
 abuse of it, e.g., rt_refcnt++/rt_refcnt-- outside
 route.c and extending it to treat referencing
 during packet processing (IOW, references from
 local variables) -- currently it handles only
 references between routes. The latter is needed for
 MP-safe networking.

 Do you propose to increase/decrease rt_refcnt in the packet processing
 path, using atomic instructions?

Atomic instructions aren't used yet, i.e., softnet_lock is still needed.
I will introduce them later. (Using refcount(9) by riastradh would be
good once it is committed.)

  ozaki-r


 Dave

 --
 David Young
 dyo...@pobox.comUrbana, IL(217) 721-9981