Paul,

The receive side is limiting the performance. Even with one netperf session, on the receiving side, one CPU is at 0% idle and the other is at 50% idle. Looking at the profiler output, most of the time is spent elsewhere, not in the driver. See the output:



Profiling interrupt: 8758 events in 45.139 seconds (194 events/sec)



Count indv cuml rcnt     nsec Hottest CPU+PIL        Caller

-------------------------------------------------------------------------------

1436  16%  16% 0.00     2372 cpu[1]                 disp_getwork

 968  11%  27% 0.00     5989 cpu[0]                 bcopy

 953  11%  38% 0.00     7363 cpu[0]                 verify_and_copy_pattern

 671   8%  46% 0.00     2824 cpu[1]                 copy_pattern

 609   7%  53% 0.00     2219 cpu[1]                 idle

 467   5%  58% 0.00     2832 cpu[1]                 copyout_more

 421   5%  63% 0.00     6747 cpu[0]                 bcopy_more

 374   4%  67% 0.00     6208 cpu[0]                 getpcstack

 259   3%  70% 0.00     4768 cpu[1]                 kmem_cache_free_debug

 203   2%  73% 0.00     7278 cpu[0]                 kmem_cache_alloc_debug

 191   2%  75% 0.00     5359 cpu[0]                 mutex_enter

 115   1%  76% 0.00     6726 cpu[0]                 tcp_rput_data

 114   1%  77% 0.00     6686 cpu[0]+6               ql_build_rx_mp

 105   1%  79% 0.00     6807 cpu[0]                 atomic_add_int_nv

  97   1%  80% 0.00     6716 cpu[0]+6               ql_ring_rx



I don't know what is done by all these "copy" routines. They seem to take 56% of the time. We did a similar test with Intel card and reveals that intel card just spends 14% of the time copying. We are wondering why?

Is this because our driver has some alignment issues in the receive path? we do not place the IP-header on a 4-byte boundary, due to hardware limitations. What the overhead on sparc will be if the driver sends packets with IP header on 2-byte boundary to upper layers?

Tom


----- Original Message ----- From: "Paul Durrant" <[email protected]>
To: "Tom Chen" <[email protected]>
Cc: <[email protected]>
Sent: Tuesday, June 02, 2009 9:57 AM
Subject: Re: [driver-discuss] GLD3 NIC driver performance tuning


Tom Chen wrote:

Is there a way in solaris to figure out how many CPUs the system has and which interrupt is assigned to which CPU? I am wondering why most of the receive interrupts are assigned to CPU 1 in our test sometimes. I wish I could assign different interrupts to different CPUs.


From within a driver? You can look at ncpus to tell you how many CPUs you have. As for interrupt -> CPU mapping; that can change dynamically (e.g. when a CPU is offlined) so relying on the mapping is risky. As for making sure interrupts get spread; there is a policy tweakable somewhere in the APIC code (can't remember where) and you need to make sure it's set to round-robin interrupt assignment; I'm assuming you're using MSI-X on a recent Solaris build.

  Paul


_______________________________________________
driver-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/driver-discuss

Reply via email to