Can you please preload libumem to diagnose the memory leak problem? I think the below should do the trick:

# svcadm disable svc:/network/datalink-management:default
# export LD_PRELOAD=libumem.so
# export UMEM_DEBUG=default
# /sbin/dlmgmtd -d 10 &
# mdb -p `pgrep dlmgmtd`
> ::findleaks


Please post the output of ::findleaks if there is any.

Thanks
- Cathy
On Mon, May 3, 2010 at 10:11 PM, Giovanni Tirloni <[email protected] <mailto:[email protected]>> wrote:

    On Sun, May 2, 2010 at 10:56 PM, Giovanni Tirloni
    <[email protected] <mailto:[email protected]>> wrote:

        Hello,

         I think we've been hit by bug 6908043, dladm show-ether
        stopped showing any interface at all.

         All details added to
        http://www.sysdroid.com/opensolaris/bugs/6908043.txt

         The only recent changes that we had were moving LACP from
        "short" to "long" on aggr0 (e1000g1+e1000g2) and we started
        using "dladm show-ether" on Zabbix to monitor the interface
        status since a few days ago. So I don't know if it was
        happening before and we never noticed or if heavy use of dladm
        show-ether is triggering the problem.


    Turned out dlmgmtd(1M) was stuck and had to be restarted:

    # svcadm restart datalink-management

    # dladm show-ether
LINK PTYPE STATE AUTO SPEED-DUPLEX PAUSE e1000g0 current up yes 1G-f bi e1000g1 current up yes 1G-f bi e1000g2 current up yes 1G-f bi e1000g3 current up yes 1G-f bi

    I'm still trying to understand what causes it. Perhaps dladm
    should have better error reporting in case it doesn't get a
    satisfactory answer from /dev/dld.


There seems to be a memory link in dlmgmtd since it's using 3.9GB of memory (12% of 32GB).

Should I file a new bug ? If anyone is interested I can send the core dump.

I also updated the file below with a dtrace output of the functions being called when you issue a "dladm show-ether" in another terminal on the same server (it's quite long).

  http://www.sysdroid.com/opensolaris/bugs/6908043.txt

# ps aux | grep dlmgmt
USER       PID %CPU %MEM   SZ  RSS TT       S    START  TIME COMMAND
dladm 15 0.0 12.140393764039376 ? S Mar 30 6:47 /sbin/dlmgmtd

# gcore 15
# ls -lh core.15
-rw-r--r-- 1 root root 3.9G 2010-05-03 22:17 core.15

# pstack 15
15:    /sbin/dlmgmtd
-----------------  lwp# 1 / thread# 1  --------------------
 feef0547 pause    ()
 08053a18 main     (1, 8047e50, 8047e58, 8047e0c) + b8
 0805326d _start   (1, 8047ef0, 0, 8047efe, 8047f0e, 8047f1f) + 7d
-----------------  lwp# 2 / thread# 2  --------------------
 feef0ea1 door     (fec9e980, 410, 0, fec9ee00, f5f00, a)
 08054b98 dlmgmt_handler (0, fec9edd8, 28, 0, 0, 8054a9c) + fc
 feef0ed2 __door_return () + 52
-----------------  lwp# 3 / thread# 3  --------------------
 feef0ea1 door     (feb9f980, 410, 0, feb9fe00, f5f00, a)
 08054b98 dlmgmt_handler (0, feb9fdd8, 28, 0, 0, 8054a9c) + fc
 feef0ed2 __door_return () + 52
-----------------  lwp# 4 / thread# 4  --------------------
 feef0ea1 door     (fea7e980, 410, 0, fea7ee00, f5f00, a)
 08054b98 dlmgmt_handler (0, fea7edd8, 28, 0, 0, 8054a9c) + fc
 feef0ed2 __door_return () + 52
-----------------  lwp# 5 / thread# 5  --------------------
 feef0ea1 door     (fe80ed90, 18, 0, fe80ee00, f5f00, a)
 08054b98 dlmgmt_handler (0, fe80ede8, 18, 0, 0, 8054a9c) + fc
 feef0ed2 __door_return () + 52
-----------------  lwp# 6 / thread# 6  --------------------
 feef0ea1 door     (fe70f980, 410, 0, fe70fe00, f5f00, a)
 08054b98 dlmgmt_handler (0, fe70fdd8, 28, 0, 0, 8054a9c) + fc
 feef0ed2 __door_return () + 52


--
Giovanni
------------------------------------------------------------------------

_______________________________________________
networking-discuss mailing list
[email protected]

_______________________________________________
networking-discuss mailing list
[email protected]

Reply via email to