Dtrace shows that there are many calls for allocb_oversize if
tcp_xmit_hitwat is set to >72400.
# dtrace -n 'allocb_oversize:ent...@[stack(),arg0]=count();}' -c "sleep 10"
dtrace: description 'allocb_oversize:entry' matched 1 probe
dtrace: pid 15355 has exited
genunix`allocb+0x4a
sockfs`socopyinuio+0xbc
sockfs`so_sendmsg+0x1a1
sockfs`socket_sendmsg+0x61
sockfs`socket_vop_write+0x63
genunix`fop_write+0x6b
genunix`write+0x2e2
unix`sys_syscall+0x17b
73792 34
genunix`allocb+0x4a
sockfs`socopyinuio+0xbc
sockfs`so_sendmsg+0x1a1
sockfs`socket_sendmsg+0x61
sockfs`socket_vop_write+0x63
genunix`fop_write+0x6b
genunix`write+0x2e2
unix`sys_syscall+0x17b
73968 30667
genunix`allocb+0x4a
sockfs`socopyinuio+0xbc
sockfs`so_sendmsg+0x1a1
sockfs`socket_sendmsg+0x61
sockfs`socket_vop_write+0x63
genunix`fop_write+0x6b
genunix`writev+0x41a
unix`sys_syscall+0x17b
73968 70907
>From the source code of allocb, it alloc memory for small msg with size
<=DBLK_MAX_CACHE(73728) from dblk cache. But for large msg with size
>DBLK_MAX_CACHE, allocb will call allocb_oversize which calls kmem_alloc to
allocate memory and takes dblk_lastfree_oversize as the free function. When
these oversized msges need to be freed, dblk_lastfree_oversize will be
called which calls kmem_free to free memory and leads to xcalls.
>From header of file usr/src/uts/common/io/stream.c, there is a paragraph as
following,does this mean that we should not set tcp_xmit_hiwat larger than
DBLK_MAX_CACHE?
*The sizes of the allocb() small-message caches are not magical.* *They
represent a good trade-off between internal and external* *fragmentation for
current workloads. They should be reevaluated* *periodically, especially if
allocations larger than DBLK_MAX_CACHE* *ome common. We use 64-byte
alignment so that dblks don't* *straddle cache lines unnecessarily.*
*
*
Thanks
Zhihui
2009/7/13 zhihui Chen <[email protected]>
> Tried more different setting for TCP parameters, find following profile
> will leads to this kind of xcall storm during workload running.(1)
> tcp_max_buf =>1048576
> (2) tcp_max_buf <1048576 and tcp_xmit_hiwat =>131072
>
> During testing, no any modification for application and its input, I just
> modify tcp parameter through ndd. From the application, it will try to set
> socket send buffer size to 1048576 through setsockopt. When tcp_max_buf is
> set to <1048576, setsockopt call will fail and report error, but it
> continues to run.
>
> Thanks
> Zhihui
>
> 2009/7/13 zhihui Chen <[email protected]>
>
> Thanks steve and andrew. I have tried following two methods:(1) use mdb to
>> set mblk_pull_len to 0. The xcall is still very high and same to before
>> doing that.
>> (2) set tcp_max_buf to 65536 for control size of send and receive buffer,
>> the xcall and kernel utilization is reduced very much and "mpstat -a 2" has
>> following output:
>>
>> SET minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt
>> idl sze
>> 0 76 0 17386 19029 876 1329 99 130 298 2 1719 0 1 0
>> 99 16
>> SET minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt
>> idl sze
>> 0 8152 0 725 161524 119596 650416 43958 40670 65765 3 356206 7
>> 52 0 41 16
>> SET minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt
>> idl sze
>> 0 6460 0 804 163550 121536 663085 44498 42610 66349 15 352296 6
>> 53 0 40 16
>> SET minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt
>> idl sze
>> 0 7560 0 711 168416 123996 667632 46738 46100 67780 13 374021 7
>> 53 0 40 16
>>
>>
>> 2009/7/10 Andrew Gallatin <[email protected]>
>>
>> Steve Sistare wrote:
>>>
>>>> This has the same signature as CR:
>>>>
>>>> 6694625 Performance falls off the cliff with large IO sizes
>>>> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6694625
>>>>
>>>> which was raised in the network forum thread "expensive pullupmsg in
>>>> kstrgetmsg()"
>>>>
>>>> http://www.opensolaris.org/jive/thread.jspa?messageID=229124𷼄
>>>>
>>>> This was fixed in nv_112, so it just missed OpenSolaris 2009.06.
>>>>
>>>> The discussion says that the issue gets worse when the RX socket
>>>> buffer size is increased, so you could try reducing it as a workaround.
>>>>
>>>
>>> I'm the one who filed that bug..
>>>
>>> You can tell if its this particular bug by using mdb to
>>> set mblk_pull_len to 0 and seeing if that reduces the
>>> xcalls due to dblk_lastfree_oversize
>>>
>>> Cheers,
>>>
>>> Drew
>>>
>>
>>
>
_______________________________________________
perf-discuss mailing list
[email protected]