We know that mac_tx has two call path to go through, one is syscall path
 like following:               dld`str_mdata_fastpath_put+0xa4
 ip`tcp_send_data+0x8b4
              ip`tcp_output+0x7ea
              ip`squeue_enter+0x330
              ip`tcp_sendmsg+0xfb
              sockfs`so_sendmsg+0x1c7
              sockfs`socket_sendmsg+0x61
              sockfs`sendit+0x167
              sockfs`send+0x78
              sockfs`send32+0x22
              unix`_sys_sysenter_post_swapgs+0x14b

The other path is worker thread path like followng:
              dld`str_mdata_fastpath_put+0xa4
              ip`tcp_send_data+0x8b4
              ip`tcp_send+0xb01
              ip`tcp_wput_data+0x721
              ip`tcp_rput_data+0x33d1
              ip`squeue_drain+0x179
              ip`squeue_enter+0x3f4
              ip`ip_input+0xc17
              mac`mac_rx_soft_ring_drain+0xdf
              mac`mac_soft_ring_worker+0x111
              unix`thread_start+0x8

I tried to know the number of mac_tx call path in 10 seconds on another
dterm.

Before I using dtrace comand to improve performance, the call path
distribution is
Syscall Path: 641615
Worker Thread Path: 482210

After using dtrace command to improve performance, the call path
distribution is:
Syscall Path:  319273
Worker Thread Path:  1061620

Thanks
Zhihui



2009/4/1 zhihui Chen <zhchen3 at gmail.com>

> Thanks, I have tried your method and the poll function is disabled. After
> that, the context switches is decreased very much, but the performance is
> still remained at 8.8Gbps. Mpstat output like following:CPU minf mjf xcal
>  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
>   0    0   0    0    33    1   15    8    0 1366    0 134084    3  97   0
> 0
>   1   10   0    0    56   16 35208    6    8 1736    0     9    0  48   0
>  52
>   2    4   0   37 19646 19618   58    0   10  408    0   154    0  34   0
>  66
>   3    0   0    0   308  107  118    0    6    1    0   185    0   0   0
> 100
>
>
> Then I use the same dtrace command again, the performance is improved to
> 9.5Gbps and context switches is also reduced from 35000 to 6300. Mpstat
> output likes following:
> CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt
> idl
>   0    0   0    0    17    3 1858    8    3  605    0 142000    3  81   0
>  16
>   1    0   0    0    15    6 4472    0    2 2516    0     0    0  92   0
> 8
>   2    0   0    0 19740 19679  126    0    6  253    0   289    0  46   0
>  54
>   3    0   0    9   509  208   66    0    4    0    0   272    0   0   0
> 100
>
> Because this is TX heavy workload, maybe we should care more about thread
> mac_soft_ring_worker???
>
> Thanks
> Zhihui
>
> 2009/4/1 Sunay Tripathi <Sunay.Tripathi at sun.com>
>
>> Sure. Just run this and polling will get disabled
>> % dladm create-vnic -l ixgbe0 vnic1
>>
>> Let us know what you get with polling disabled. We don't have a
>> tunable to disable polling but since ixgbe can't assign rx rings to
>> VNIC yet, it disable polling for primary NIC as well.
>>
>> Cheers,
>> Sunay
>>
>> zhihui Chen wrote:
>>
>>> During my test for 10GBE(Intel Ixgbe) with snv_110, I find the context
>>> switch is a big problem for the performance. Benchmark: Netperf-2.4.4
>>> Workload: TCP_STREAM (sending 8KB-size tcp packets from SUT to remote
>>> machine)
>>>
>>> Crossbow use two kernel threads ( mac_soft_ring_worker and
>>> mac_rx_srs_poll_ring) to help send and recv packets in the kernel. On
>>> multi-core or multi-processor system, these two threads and interrupt for
>>> nic  can run on different CPU. Considering following scenario on my 4-core
>>> system:
>>>    mac_soft_ring_worker: CPU 1
>>>    mac_rx_srs_poll_ring:  CPU 1
>>>    Interrupt: CPU 2
>>>
>>> I run the workload and bind the application to free CPU 0, then I get the
>>> performance results at 8.8Gbps and mpstat output like following:     CPU
>>> minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
>>>  0    0   0    0    17    3   21    9    0 1093    0 134501    3  97   0
>>>   0
>>>  1    0   0    0    29   13 56972    2    7  992    0     2    0  50   0
>>>  50
>>>  2   14   0    0 19473 19455   37    0    8    0    0   149    0  28   0
>>>  72
>>>  3    0   0    1   305  104  129    0    4    1    0     9    0   0   0
>>> 100
>>> CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt
>>> idl
>>>  0    0   0    0    14    2    2    7    0 1120    0 133511    3  97   0
>>>   0
>>>  1    0   0    0    24   12 54501    2    6  971    0     2    0  48   0
>>>  52
>>>  2    0   0    0 19668 19648   45    0    9    0    0   149    0  28   0
>>>  72
>>>  3    0   0    0   306  104  128    0    6    0    0    11    0   0   0
>>> 100
>>> CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt
>>> idl
>>>  0    0   0    0    14    2   21    8    2 1107    0 134569    3  97   0
>>>   0
>>>  1    0   0    0    32   16 57522    2    6  928    0     2    0  50   0
>>>  50
>>>  2    0   0    0 19564 19542   46    0   10    1    0   140    0  28   0
>>>  72
>>>  3    0   0    0   306  104  122    0    7    0    0    58    0   0   0
>>> 100
>>>
>>>
>>> Next step, I just run one dtrace command: dtrace -n 'mac_tx:entry{@
>>> [probefunc,stack()]=count();}'
>>> Then I can get performance at 9.57Gbps and mpstat output like following:
>>> CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt
>>> idl
>>>  0    0   0    0    23    5 2055    9    4  529    0 142719    3  81   0
>>>  15
>>>  1    0   0    1    21    8 24343    0    2 2523    0     0    0  88   0
>>>  12
>>>  2   14   0    5 19678 19537   81    0    5    0    0   150    0  43   0
>>>  57
>>>  3    0   0    6   308  104   93    0    5    2    0   278    0   0   0
>>> 100
>>> CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt
>>> idl
>>>  0    0   0    2    19    4 1998    9    6  556    0 142911    3  82   0
>>>  16
>>>  1    0   0    0    20    8 23543    1    2 2556    0     0    0  88   0
>>>  12
>>>  2    0   0    6 19647 19499  106    0    8    1    0   266    0  43   0
>>>  57
>>>  3    0   0    2   308  104   70    0    5    1    0    28    0   0   0
>>> 100
>>> CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt
>>> idl
>>>  0    0   0    0    21    3 1968   10    4  556    0 144547    3  82   0
>>>  15
>>>  1    0   0    0    20   10 23334    0    2 2622    0     0    0  90   0
>>>  10
>>>  2    0   0    9 19797 19658   92    0   10    1    0   274    0  44   0
>>>  56
>>>  3    0   0    0   307  104   95    0    6    2    0   182    0   0   0
>>>  99
>>>
>>> I dont think dtrace can help improve the performance of nic. If you
>>> compare the mpstat output, the biggest difference is that context switch has
>>> been reduced very much from 55000 to 23000.   This leads to my point that
>>> two much context switches hinders the performance of crossbow.
>>> If I make these two kernel threads and interrupt run in totally different
>>> cores, the performance can be reduced to about 7.8Gbps while context
>>> switches will be increase to at about 80000 per second, but the interrupts
>>> remains at about 19500/s
>>>
>>> In crossbow, thread mac_soft_ring_worker will be wakeup by
>>> mac_rx_srs_poll_ring and interrupt through calling the function
>>> mac_soft_ring_worker_wakeup. I just think that if I can disable the polling
>>> function, then context switches should be reduced.
>>>  Thanks
>>> Zhihui
>>>
>>>
>>>
>>> 2009/4/1 rajagopal kunhappan <rajagopal.kunhappan at sun.com <mailto:
>>> rajagopal.kunhappan at sun.com>>
>>>
>>>    Hi Zhihui,
>>>
>>>
>>>        In crossbow, each mac_srs has a kernel thread called
>>>        "mac_rx_srs_poll_ring"
>>>        to poll the hardware and crossbow will wakeup this thread to
>>>        poll packets
>>>        from the hardware automatically. Does crossbow provide any
>>>        method to disable
>>>        the polling mechanism, for example disabling the this kernel
>>> thread?
>>>
>>>
>>>    Presently no. Can we know why you would want to do that?
>>>
>>>    Thanks,
>>>    -krgopi
>>>
>>>        Thanks
>>>        Zhihui
>>>
>>>
>>>
>>>
>>>  ------------------------------------------------------------------------
>>>
>>>        _______________________________________________
>>>        crossbow-discuss mailing list
>>>        crossbow-discuss at opensolaris.org
>>>        <mailto:crossbow-discuss at opensolaris.org>
>>>        http://mail.opensolaris.org/mailman/listinfo/crossbow-discuss
>>>
>>>
>>>
>>>    --
>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://mail.opensolaris.org/pipermail/crossbow-discuss/attachments/20090401/a27260e1/attachment.html>

Reply via email to