Hi.

Some may remember that not so long ago I complained about high lock congestion on pbuf_mtx. At that time switching the mutex to padalign reduced the problem. But now after improving scalability in CAM and GEOM and doing more then half million IOPS on 32-core system I again heavily hit that problem -- hwpmc shows about 30% of CPU time spent on that mutex spinning and another 30% of time spent on attempt of threads to go to sleep on that mutex and getting more collisions there.

Trying to mitigate that I've made a patch (http://people.freebsd.org/~mav/pcpu_pbuf.patch) to split single queue of pbufs into several. That definitely cost some amount of KVA and memory, but on my tests it fixes problem redically, removing any measurable congestion there. The patch is not complete and don't even boot on i386 now, but I would like to hear opinions about the approach, or may be some better propositions.

Another patch I've made (http://people.freebsd.org/~mav/si_threadcount.patch) removes lock acquisition from dev_relthread() by using atomics for reference counting. That fixes another congestion I see. This patch looks fine to me and the only congestion I see after that is on HBA driver locks, but may be I am missing something?

Thank you.

--
Alexander Motin
_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"

Reply via email to