On Monday 19 February 2007 15:25, Evgeniy Polyakov wrote: > On Mon, Feb 19, 2007 at 03:14:02PM +0100, Eric Dumazet ([EMAIL PROTECTED]) wrote: > > > Forget about cache misses and cache lines - we have a hash table, only > > > part of which is used (part for time-wait sockets, part for established > > > ones). > > > > No you didnt not read my mail. Current ehash is not as decribed by you. > > I did. > And I also said that my tests do not have timewait sockets at all - I > removed sk_for_each and so on, which should effectively increase lookup > time twice on busy system with lots of created/removed sockets per > timeframe (that is theory from my side already). > Anyway, I ran the same test with increased table too. > > > > Anyway, even with 2^20 (i.e. when the whole table is only used for > > > established sockets) search time is about 360-370 nsec on 3.7 GHz Core > > > Duo (only one CPU is used) with 2 GB of ram. > > > > Your tests are user land, so unfortunatly are biased... > > > > (Unless you use hugetlb data ?) > > No I do not. But the same can be applied to trie test - it is also > performed in userspace and thus suffers from possible swapping/cache > flushing and so on. > > And I doubt moving test into kernel will suddenly end up with 10 times > increased rates.
At least some architectures pay a high price using vmalloc() instead of kmalloc(), and TLB misses means something for them. Not everybody has the latest Intel cpu. Normally, ehash table is using huge pages. > > Anyway, trie test (broken implementation) is two times slower than hash > table (resized already), and it does not include locking isses of the > hash table access and further scalability issues. > You mix apples and oranges. We already know locking has nothing to do with hashing or trie-ing. We *can* put RCU on top of the existing ehash. We also can add hash resizing if we really care. > I think I need to fix my trie implementation to fully show its > potential, but original question was why tree/trie based implementation > is not considered at all although it promises better performance and > scalability. Because you mix performance and scalability. Thats not exactly the same. Sometime, high performance means *suboptimal* scalability. Because O(1) is different of O(log(N)) ? if N = 2^20, it certainly makes a difference. Yes, 1% of chains might have a length > 10, but yet 99% of the lookups are touching less than 4 cache lines. With a binary tree, log(2^20) is 20. or maybe not ? If you tell me it's 4, I will be very pleased. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html