Benjamin LaHaise a écrit :
Hello again,
This patch introduces the use of rcu for the ipv4 established connections
hashtable, as well as the timewait table since they are closely intertwined.
This removes 4 atomic operations per packet from the tcp_v4_rcv codepath,
which helps quite a bit when the other performance barriers in the system
are removed. Eliminating the rwlock cache bouncing should also help on SMP
systems.
By itself, this improves local netperf performance on a P4/HT by ~260Mbit/s
on average. With smaller packets (say, ethernet size) the difference should
be larger.
On a second thought, do you think we still need one rwlock per hash chain ?
TCP established hash table entries: 1048576 (order: 12, 16777216 bytes)
On this x86_64 machine, we 'waste' 8 MB of ram for those rwlocks.
With RCU, we touch these rwlocks only on TCP connection creation/deletion,
maybe we could reduce to one rwlock or a hashed array of 2^N rwlocks (2^N
depending on NR_CPUS), like in net/ipv4/route.c ?
Eric
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html