I suppose the reason of first gain lies in increasing of cpu cache hits.
In current fbsd code allocations and freeings deal with
separate buckets. Buckets are changed when one of them
became full or empty first. In Solaris this work is pure LIFO:
i.e. alloc() and free() work with one bucket - the current bucket
(it is called magazine there), that's why cache hit rate is bigger.
Another optimization is very trivial, for example:
- bucket->ub_cnt--;
- item = bucket->ub_bucket[bucket->ub_cnt];
+ item = bucket->ub_bucket[--bucket->ub_cnt];
(see the patch)
The test program:
#include <unistd.h>
#include <sys/socket.h>
main(int argc, char *argv[])
{
int *fd, n, i,j, iters=100000;
n = atoi(argv[1]);
fd = (int*) malloc(sizeof(*fd) * n);
iters /= n;
for (i=0; i<iters; i++) {
for (j=0; j<n; j++)
fd[j] = socket(AF_UNIX, SOCK_STREAM, 0);
for (j=0; j<n; j++)
close(fd[j]);
}
}
The results with current uma_core.c
time ./sockloop 1 # first arg is the
number of sockets that created in one iteration
0.093u 2.650s 0:02.75 99.6% 5+180k 0+0io 0pf+0w
time ./sockloop 1
0.108u 2.298s 0:02.41 99.1% 5+181k 0+0io 0pf+0w
time ./sockloop 1
0.127u 2.278s 0:02.41 99.1% 5+177k 0+0io 0pf+0w
time ./sockloop 10 # number of iterations
is changed according to arg (see code)
0.054u 2.239s 0:02.30 99.1% 5+181k 0+0io 0pf+0w
time ./sockloop 10
0.069u 2.199s 0:02.27 99.1% 6+184k 0+0io 0pf+0w
time ./sockloop 10
0.086u 2.185s 0:02.28 99.1% 5+178k 0+0io 0pf+0w
time ./sockloop 100
0.101u 2.393s 0:02.51 99.2% 5+179k 0+0io 0pf+0w
time ./sockloop 100
0.085u 2.505s 0:02.60 99.2% 5+180k 0+0io 0pf+0w
time ./sockloop 100
0.054u 2.441s 0:02.50 99.6% 5+178k 0+0io 0pf+0w
time ./sockloop 1000
0.093u 2.739s 0:02.84 99.2% 5+181k 0+0io 0pf+0w
time ./sockloop 1000
0.085u 2.797s 0:02.89 99.3% 5+180k 0+0io 0pf+0w
time ./sockloop 1000
0.117u 2.689s 0:02.82 98.9% 5+179k 0+0io 0pf+0w
The results of first optimization (only buckets management)
time ./sockloop 1
0.125u 1.938s 0:02.06 99.5% 5+180k 0+0io 0pf+0w
time ./sockloop 1
0.070u 1.993s 0:02.06 100.0% 5+180k 0+0io 0pf+0w
time ./sockloop 1
0.110u 1.953s 0:02.06 100.0% 5+177k 0+0io 0pf+0w
time ./sockloop 10
0.093u 1.776s 0:01.87 99.4% 5+180k 0+0io 0pf+0w
time ./sockloop 10
0.116u 1.754s 0:01.87 99.4% 5+181k 0+0io 0pf+0w
time ./sockloop 10
0.093u 1.777s 0:01.87 99.4% 5+181k 0+0io 0pf+0w
time ./sockloop 100
0.100u 2.182s 0:02.29 99.5% 5+180k 0+0io 0pf+0w
time ./sockloop 100
0.093u 2.174s 0:02.27 99.5% 5+180k 0+0io 0pf+0w
time ./sockloop 100
0.078u 2.158s 0:02.24 99.1% 5+180k 0+0io 0pf+0w
time ./sockloop 1000
0.101u 2.403s 0:02.51 99.6% 5+180k 0+0io 0pf+0w
time ./sockloop 1000
0.124u 2.381s 0:02.52 99.2% 5+180k 0+0io 0pf+0w
time ./sockloop 1000
0.125u 2.373s 0:02.51 99.2% 5+178k 0+0io 0pf+0w
The results of both optimizations
time ./sockloop 1
0.062u 1.785s 0:01.85 99.4% 5+180k 0+0io 0pf+0w
time ./sockloop 1
0.124u 1.722s 0:01.85 99.4% 5+180k 0+0io 0pf+0w
time ./sockloop 1
0.087u 1.759s 0:01.85 98.9% 5+177k 0+0io 0pf+0w
time ./sockloop 10
0.069u 1.684s 0:01.75 99.4% 5+181k 0+0io 0pf+0w
time ./sockloop 10
0.070u 1.673s 0:01.74 100.0% 5+180k 0+0io 0pf+0w
time ./sockloop 10
0.070u 1.672s 0:01.74 100.0% 5+177k 0+0io 0pf+0w
time ./sockloop 100
0.077u 2.102s 0:02.18 99.5% 5+180k 0+0io 0pf+0w
time ./sockloop 100
0.116u 2.062s 0:02.18 99.5% 5+180k 0+0io 0pf+0w
time ./sockloop 100
0.055u 2.126s 0:02.19 99.0% 5+178k 0+0io 0pf+0w
time ./sockloop 1000
0.077u 2.298s 0:02.39 98.7% 5+181k 0+0io 0pf+0w
time ./sockloop 1000
0.070u 2.340s 0:02.42 99.5% 5+178k 0+0io 0pf+0w
time ./sockloop 1000
0.054u 2.320s 0:02.39 99.1% 5+179k 0+0io 0pf+0w
the patch is for uma_core.c from RELENG_5, but i checked
uma_core.c in CURRENT - it's the same regarding to thiese
improvements. I don't have any commit rights, so the patch
is just for reviewing. Here it is:
--- sys/vm/uma_core.c.orig Wed Jun 29 21:46:52 2005
+++ sys/vm/uma_core.c Wed Jun 29 23:09:32 2005
@@ -1830,8 +1830,7 @@
if (bucket) {
if (bucket->ub_cnt > 0) {
- bucket->ub_cnt--;
- item = bucket->ub_bucket[bucket->ub_cnt];
+ item = bucket->ub_bucket[--bucket->ub_cnt];
#ifdef INVARIANTS
bucket->ub_bucket[bucket->ub_cnt] = NULL;
#endif
@@ -2252,7 +2251,7 @@
cache = &zone->uz_cpu[cpu];
zfree_start:
- bucket = cache->uc_freebucket;
+ bucket = cache->uc_allocbucket;
if (bucket) {
/*
@@ -2263,8 +2262,7 @@
if (bucket->ub_cnt < bucket->ub_entries) {
KASSERT(bucket->ub_bucket[bucket->ub_cnt] == NULL,
("uma_zfree: Freeing to non free bucket index."));
- bucket->ub_bucket[bucket->ub_cnt] = item;
- bucket->ub_cnt++;
+ bucket->ub_bucket[bucket->ub_cnt++] = item;
#ifdef INVARIANTS
ZONE_LOCK(zone);
if (keg->uk_flags & UMA_ZONE_MALLOC)
@@ -2275,7 +2273,7 @@
#endif
CPU_UNLOCK(cpu);
return;
- } else if (cache->uc_allocbucket) {
+ } else if (cache->uc_freebucket) {
#ifdef UMA_DEBUG_ALLOC
printf("uma_zfree: Swapping buckets.\n");
#endif
@@ -2283,8 +2281,7 @@
* We have run out of space in our freebucket.
* See if we can switch with our alloc bucket.
*/
- if (cache->uc_allocbucket->ub_cnt <
- cache->uc_freebucket->ub_cnt) {
+ if (cache->uc_freebucket->ub_cnt == 0) {
bucket = cache->uc_freebucket;
cache->uc_freebucket = cache->uc_allocbucket;
cache->uc_allocbucket = bucket;
if one will decide to commit first optimization (about buckets),
then there must some adjustments be made also
regarding correct statistics gathering.
Regards,
Andriy Tkachuk.
_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"