Robert Watson wrote:
I guess the question is: where are the cycles going? Are we suffering excessive cache misses in managing the slabs? Are you effectively "cycling through" objects rather than using a smaller set that fits better in the cache?

In my test setup only several objects from zone usually allocated same time, but they allocated two times per every packet.

To check UMA dependency I have made a trivial one-element cache which in my test case allows to avoid two for four allocations per packet.
.....alloc.....
-       item = uma_zalloc(ng_qzone, wait | M_ZERO);
+       mtx_lock_spin(&itemcachemtx);
+       item = itemcache;
+       itemcache = NULL;
+       mtx_unlock_spin(&itemcachemtx);
+       if (item == NULL)
+               item = uma_zalloc(ng_qzone, wait | M_ZERO);
+       else
+               bzero(item, sizeof(*item));
.....free.....
-       uma_zfree(ng_qzone, item);
+       mtx_lock_spin(&itemcachemtx);
+       if (itemcache == NULL) {
+               itemcache = item;
+               item = NULL;
+       }
+       mtx_unlock_spin(&itemcachemtx);
+       if (item)
+               uma_zfree(ng_qzone, item);
...............

To be sure that test system is CPU-bound I have throttled it with sysctl to 1044MHz. With this patch my test PPPoE-to-PPPoE router throughput has grown from 17 to 21Mbytes/s. Profiling results I have sent promised close results.

Is some bit of debugging enabled that shouldn't be, perhaps due to a failure of ifdefs?

I have commented out all INVARIANTS and WITNESS options from GENERIC kernel config. What else should I check?

--
Alexander Motin
_______________________________________________
freebsd-performance@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to