Robert Watson wrote:
I guess the question is: where are the cycles going? Are we suffering
excessive cache misses in managing the slabs? Are you effectively
"cycling through" objects rather than using a smaller set that fits
better in the cache?
In my test setup only several objects from zone usually allocated same
time, but they allocated two times per every packet.
To check UMA dependency I have made a trivial one-element cache which in
my test case allows to avoid two for four allocations per packet.
.....alloc.....
- item = uma_zalloc(ng_qzone, wait | M_ZERO);
+ mtx_lock_spin(&itemcachemtx);
+ item = itemcache;
+ itemcache = NULL;
+ mtx_unlock_spin(&itemcachemtx);
+ if (item == NULL)
+ item = uma_zalloc(ng_qzone, wait | M_ZERO);
+ else
+ bzero(item, sizeof(*item));
.....free.....
- uma_zfree(ng_qzone, item);
+ mtx_lock_spin(&itemcachemtx);
+ if (itemcache == NULL) {
+ itemcache = item;
+ item = NULL;
+ }
+ mtx_unlock_spin(&itemcachemtx);
+ if (item)
+ uma_zfree(ng_qzone, item);
...............
To be sure that test system is CPU-bound I have throttled it with sysctl
to 1044MHz. With this patch my test PPPoE-to-PPPoE router throughput has
grown from 17 to 21Mbytes/s. Profiling results I have sent promised
close results.
Is some bit of debugging enabled that shouldn't
be, perhaps due to a failure of ifdefs?
I have commented out all INVARIANTS and WITNESS options from GENERIC
kernel config. What else should I check?
--
Alexander Motin
_______________________________________________
freebsd-performance@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[EMAIL PROTECTED]"