Dear all!
We deployed some SIP server application on top of multi-core Sparc machine.
The CPU number is matched with the number of application processes. All
processes are
waiting on the same socket and doing the same thing in a loop: call a
blocking recvfrom
and then process the received message. The transport we are using is UDP.
What we observed is that when we set the CPU number to be larger than 16,
(user application processes number is also 16), very severe performance
degradation will occur.
We used lockstat to profile the kernel and lock contention and referenced
opensolaris source code. What we found is as following:
1) the function "mutex_vector_enter" accounts for 50% of the CPU time.
2) most of mutex_vector_enter is called by "cv_wait_sig", and the call graph
is:
recvfrom -> syscall_trap32-> recvfrom -> recvit ->
sotpi_recvmsg->so_lock_read_intr-> cv_wait_sig
3) so_lock_read_intr seems like a function to serialize the function
"kstrgetmsg" which copy data from kernel space to user buffer. Thus, only one
user can do this copy at a time. It seems unnessesary for simply UDP
processing, because kernel can just lock during dequeuing the packet from sock
queue, but doesn't lock during copying data from kernel to user space, just
like what linux does.
Our question is: Is there an alternative path for "recvfrom" which is more
simple than the current sotpi_recvmsg imeplementation, and will not lock
during copying data from kernel to user space ?
We believe there exists a more direct channel between the user application
and the network stack for UDP, can anyone with this knowledge give us a hand?
Thanks!
Cheers
Yours
Jia