This is a repost of the series I sent the other day with significant
additions and some minor mods to the original patches.

The biggest change since my last posting of this stuff is to add
hashing of non-prefixed policies to speed up policy insert/delete and
lookup.  Alexey Kuznetsov is who this idea came from.

I consider these bits logically complete at this point.  In all of my
stress testing the things showing up at the top of the profiles now
are bzero() in glibc, and memset/memcpy in the kernel :-)

On my desktop I can insert about 60,000 SA entries per second, and I
can insert about 130,000 SPD entries per second.  Before these changes
trying to insert even 30,000 SPD entries would probably take a half
hour or so on the same system.  So we clearly needed improvements
here.

The basic summary:

1) Hash xfrm_state objects using two dynamically sized hash tables.
   One hashes on SPI/PROTO/DADDR, the other on FAMILY/REQID/DADDR/SADDR

   SPI/PROTO/DADDR hash is used on packet input.

   FAMILY/REQID/DADDR/SADDR hash is used on output route resolution lookup,
   and insertion conflict resolution.

   It is also used to assist handling of potential "shadowing" of xfrm_state
   objects when a new xfrm_state is inserted.

2) Hash xfrm_policy objects by index and DADDR/SADDR if not prefixed.

   By "prefixed" we mean that either the DADDR or the SADDR are
   specifying a masked subnet instead of a full IP address.

   All xfrm_policy objects go into the index hash, which is used for
   generating unique policy->index values.

   If an xfrm_policy is "prefixed" it goes onto a per-direction singly
   linked list which looks like the policy lists the code used to have.

   If an xfrm_policy is not "prefixed", it is instead inserted into
   a per-direction hash table which is consulted first on lookups.

   All of the policy hashes are dynamically sized as needed.

3) xfrm_state objects were excessively reference counted.  The based
   implicit reference protected entry into the hashtables, and in
   exchange for not refcounting each timer reference we only pay
   a del_timer_sync() at GC destruction time.

4) xfrm_state insertion of transformations using ESP were computationally
   dominated by the initial IV value computation, via get_random_bytes().
   We can defer this until the first time we actually try to output a
   packet using this xfrm_state.

   This is good for another reason, if the xfrm_state is just for input
   packet the initial IV initialization just wastes random number entropy
   since it will never be used.

5) Generation IDs are used to keep xfrm_state insert/delete from having
   to touch the xfrm_policy database and vice versa.  Previously adding
   or removing an xfrm_state required flushing policy layer cached routes
   and other ugly crap like that.

   Every time we add an xfrm_state into the hashes, we give it a new
   generation count.  When a cached route is made which points to that
   entry, the cached route records this generation count.  On every use
   of that route, we'll go through xfrm_dst_check() which will make sure
   the generation count of the cached route still matches the count of
   the xfrm_state it refers to.  If not, the route will be relooked up.

   When we insert a new xfrm_state, we look for any existing xfrm_states
   that match the same FAMILY/REQID/DADDR/SADDR.  On each such match we
   assign a new generation count to force a mismatch with any cached
   routes referring to those entries.

   A route relookup will also be forced on xfrm_state removal because
   xfrm_dst_check() makes sure that xfrm_state->km.state is set to
   XFRM_STATE_VALID.

6) All linkage converted to hlists so that the hash tables are more
   compact.  This made xfrm_policy_insert()'s priority based insertion
   a little hairy, but overall it seems to be a clear improvement.

Well... I guess that was actually the not-so-basic summary :-)

Now that the control plane is reasonably fast I'll start looking at the
data path.  One of the first ideas I have derived with Herbert Xu is to
put the policy bundle cached routes into the flow cache.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to