On Mon, May 3, 2010 at 10:11 PM, Giovanni Tirloni <[email protected]>wrote:

> On Sun, May 2, 2010 at 10:56 PM, Giovanni Tirloni 
> <[email protected]>wrote:
>
>> Hello,
>>
>>  I think we've been hit by bug 6908043, dladm show-ether stopped showing
>> any interface at all.
>>
>>  All details added to
>> http://www.sysdroid.com/opensolaris/bugs/6908043.txt
>>
>>  The only recent changes that we had were moving LACP from "short" to
>> "long" on aggr0 (e1000g1+e1000g2) and we started using "dladm show-ether" on
>> Zabbix to monitor the interface status since a few days ago. So I don't know
>> if it was happening before and we never noticed or if heavy use of dladm
>> show-ether is triggering the problem.
>>
>
> Turned out dlmgmtd(1M) was stuck and had to be restarted:
>
> # svcadm restart datalink-management
>
> # dladm show-ether
> LINK            PTYPE    STATE    AUTO  SPEED-DUPLEX
> PAUSE
> e1000g0         current  up       yes   1G-f                            bi
> e1000g1         current  up       yes   1G-f                            bi
> e1000g2         current  up       yes   1G-f                            bi
> e1000g3         current  up       yes   1G-f                            bi
>
> I'm still trying to understand what causes it. Perhaps dladm should have
> better error reporting in case it doesn't get a satisfactory answer from
> /dev/dld.
>

There seems to be a memory link in dlmgmtd since it's using 3.9GB of memory
(12% of 32GB).

Should I file a new bug ? If anyone is interested I can send the core dump.

I also updated the file below with a dtrace output of the functions being
called when you issue a "dladm show-ether" in another terminal on the same
server (it's quite long).

  http://www.sysdroid.com/opensolaris/bugs/6908043.txt

# ps aux | grep dlmgmt
USER       PID %CPU %MEM   SZ  RSS TT       S    START  TIME COMMAND
dladm       15  0.0 12.140393764039376 ?        S   Mar 30  6:47
/sbin/dlmgmtd

# gcore 15
# ls -lh core.15
-rw-r--r-- 1 root root 3.9G 2010-05-03 22:17 core.15

# pstack 15
15:    /sbin/dlmgmtd
-----------------  lwp# 1 / thread# 1  --------------------
 feef0547 pause    ()
 08053a18 main     (1, 8047e50, 8047e58, 8047e0c) + b8
 0805326d _start   (1, 8047ef0, 0, 8047efe, 8047f0e, 8047f1f) + 7d
-----------------  lwp# 2 / thread# 2  --------------------
 feef0ea1 door     (fec9e980, 410, 0, fec9ee00, f5f00, a)
 08054b98 dlmgmt_handler (0, fec9edd8, 28, 0, 0, 8054a9c) + fc
 feef0ed2 __door_return () + 52
-----------------  lwp# 3 / thread# 3  --------------------
 feef0ea1 door     (feb9f980, 410, 0, feb9fe00, f5f00, a)
 08054b98 dlmgmt_handler (0, feb9fdd8, 28, 0, 0, 8054a9c) + fc
 feef0ed2 __door_return () + 52
-----------------  lwp# 4 / thread# 4  --------------------
 feef0ea1 door     (fea7e980, 410, 0, fea7ee00, f5f00, a)
 08054b98 dlmgmt_handler (0, fea7edd8, 28, 0, 0, 8054a9c) + fc
 feef0ed2 __door_return () + 52
-----------------  lwp# 5 / thread# 5  --------------------
 feef0ea1 door     (fe80ed90, 18, 0, fe80ee00, f5f00, a)
 08054b98 dlmgmt_handler (0, fe80ede8, 18, 0, 0, 8054a9c) + fc
 feef0ed2 __door_return () + 52
-----------------  lwp# 6 / thread# 6  --------------------
 feef0ea1 door     (fe70f980, 410, 0, fe70fe00, f5f00, a)
 08054b98 dlmgmt_handler (0, fe70fdd8, 28, 0, 0, 8054a9c) + fc
 feef0ed2 __door_return () + 52


-- 
Giovanni
_______________________________________________
networking-discuss mailing list
[email protected]

Reply via email to