On Sun, Feb 18, 2007 at 09:21:30PM +0100, Eric Dumazet ([EMAIL PROTECTED]) 
wrote:
> Evgeniy Polyakov a e'crit :
> >On Sun, Feb 18, 2007 at 07:46:22PM +0100, Eric Dumazet 
> >([EMAIL PROTECTED]) wrote:
> >>>Why anyone do not want to use trie - for socket-like loads it has
> >>>exactly constant search/insert/delete time and scales as hell.
> >>>
> >>Because we want to be *very* fast. You cannot beat hash table.
> >>
> >>Say you have 1.000.000 tcp connections, with 50.000 incoming packets per 
> >>second to *random* streams...
> >
> >What is really good in trie, that you may have upto 2^32 connections
> >without _any_ difference in lookup performance of random streams.
> 
> So are you speaking of one memory cache miss per lookup ?
> If not, you loose.

With trie big part of it _does_ live in cache compared to hash table
where similar addresses ends up in a completely different hash entries.

> >>With a 2^20 hashtable, a lookup uses one cache line (the hash head 
> >>pointer) plus one cache line to get the socket (you need it to access its 
> >>refcounter)
> >>
> >>Several attempts were done in the past to add RCU to ehash table (last 
> >>done by Benjamin LaHaise last March). I believe this was delayed a bit, 
> >>because David would like to be able to resize the hash table...
> >
> >This is a theory.
> 
> Not theory, but actual practice, on a real machine.
> 
> # cat /proc/net/sockstat
> sockets: used 918944
> TCP: inuse 925413 orphan 7401 tw 4906 alloc 926292 mem 304759
> UDP: inuse 9
> RAW: inuse 0
> FRAG: inuse 9 memory 18360

Theory is a speculation about performance.

Highly cache usage optimized bubble sorting still much worse than 
cache usage non-optimized binary tree.
 
> >Practice includes cost for hashing, locking, and list traversal
> >(each pointer is in own cache line btw, which must be fetched) and plus
> >the same for time wait sockets (if we are unlucky).
> >
> >No need to talk about price of cache miss when there might be more
> >serious problems - for example length of the linked list to traverse each 
> >time new packet is received.
> >
> >For example lookup time in trie with 1.6 millions random 3-dimensional
> >32bit (saddr/daddr/ports) entries is about 1 microsecond on amd athlon64 
> >3500 cpu (test was ran in userspace emulator though).
> 
> 1 microsecond ? Are you kidding ? We want no more than 50 ns.

Theory again.

Existing table does not scale that good - I created (1<<20)/2 (to cover only
established part) entries table and filled it with 1 million of random entries 
-search time is about half of microsecod.

Wanna see a code? I copied Linux hash table magic into userspace and run
the same inet_hash() and inet_lookup() in a loop. Result above.

Trie is still 2 times worse, but I've just found a bug in my implementation.

-- 
        Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to