Dear all!

  We deployed some SIP server application on top of multi-core Sparc machine.
  The CPU number is matched with the number of application processes. All 
processes are
  waiting on the same socket and doing the same thing in a loop: call a 
blocking recvfrom 
  and then process the received message. The transport we are using is UDP.
  
  What we observed is that when we set the CPU number to be larger than 16, 
  (user application processes number is also 16), very severe performance 
degradation will occur. 
  
  We used lockstat to profile the kernel and lock contention and referenced
  opensolaris source code. What we found is as following:
  
  1) the function "mutex_vector_enter" accounts for 50% of the CPU time.
  
  2) most of mutex_vector_enter is called by "cv_wait_sig", and the call graph 
is:
  recvfrom -> syscall_trap32-> recvfrom -> recvit -> 
sotpi_recvmsg->so_lock_read_intr-> cv_wait_sig
  
  3) so_lock_read_intr seems like a function to serialize the function 
"kstrgetmsg" which copy data from kernel space to user buffer. Thus, only one 
user can do this copy at a time. It  seems unnessesary for simply UDP 
processing, because kernel can just lock during dequeuing the packet from sock 
queue, but doesn't lock during copying data from kernel to user space, just 
like what linux does.
  
  Our question is: Is there an alternative path for "recvfrom" which is more 
simple than the current sotpi_recvmsg imeplementation, and will not lock  
during copying data from kernel to user space ?
  
  We believe there exists a more direct channel between the user application 
and the network stack for UDP, can anyone with this knowledge give us a hand? 
Thanks!
  
  Cheers
  
                                                                                
         Yours 
                                                                                
         Jia
  
                                                                              
  





Reply via email to