On Sat, 2 Feb 2008, Alexander Motin wrote:

Robert Watson wrote:
I guess the question is: where are the cycles going? Are we suffering excessive cache misses in managing the slabs? Are you effectively "cycling through" objects rather than using a smaller set that fits better in the cache?

In my test setup only several objects from zone usually allocated same time, but they allocated two times per every packet.

To check UMA dependency I have made a trivial one-element cache which in my test case allows to avoid two for four allocations per packet.

Avoiding unnecessary allocations is a good general principle, but duplicating cache logic is a bad idea. If you're able to structure the below without using locking, it strikes me you'd do much better, especially if it's in a single processing pass. Can you not use a per-thread/stack/session variable to avoid that?

.....alloc.....
-       item = uma_zalloc(ng_qzone, wait | M_ZERO);
+       mtx_lock_spin(&itemcachemtx);
+       item = itemcache;
+       itemcache = NULL;
+       mtx_unlock_spin(&itemcachemtx);

Why are you using spin locks? They are quite a bit more expensive on several hardwawre platforms, and any environment it's safe to call uma_zalloc() from will be equally safe to use regular mutexes from (i.e., mutex-sleepable).

+       if (item == NULL)
+               item = uma_zalloc(ng_qzone, wait | M_ZERO);
+       else
+               bzero(item, sizeof(*item));
.....free.....
-       uma_zfree(ng_qzone, item);
+       mtx_lock_spin(&itemcachemtx);
+       if (itemcache == NULL) {
+               itemcache = item;
+               item = NULL;
+       }
+       mtx_unlock_spin(&itemcachemtx);
+       if (item)
+               uma_zfree(ng_qzone, item);
...............

To be sure that test system is CPU-bound I have throttled it with sysctl to 1044MHz. With this patch my test PPPoE-to-PPPoE router throughput has grown from 17 to 21Mbytes/s. Profiling results I have sent promised close results.

Is some bit of debugging enabled that shouldn't be, perhaps due to a failure of ifdefs?

I have commented out all INVARIANTS and WITNESS options from GENERIC kernel config. What else should I check?

Hence my request for drilling down a bit on profiling -- the question I'm asking is whether profiling shows things running or taking time that shouldn't be.

Robert N M Watson
Computer Laboratory
University of Cambridge
_______________________________________________
freebsd-performance@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to