On Mon, May 3, 2010 at 10:11 PM, Giovanni Tirloni <[email protected]>wrote:
> On Sun, May 2, 2010 at 10:56 PM, Giovanni Tirloni > <[email protected]>wrote: > >> Hello, >> >> I think we've been hit by bug 6908043, dladm show-ether stopped showing >> any interface at all. >> >> All details added to >> http://www.sysdroid.com/opensolaris/bugs/6908043.txt >> >> The only recent changes that we had were moving LACP from "short" to >> "long" on aggr0 (e1000g1+e1000g2) and we started using "dladm show-ether" on >> Zabbix to monitor the interface status since a few days ago. So I don't know >> if it was happening before and we never noticed or if heavy use of dladm >> show-ether is triggering the problem. >> > > Turned out dlmgmtd(1M) was stuck and had to be restarted: > > # svcadm restart datalink-management > > # dladm show-ether > LINK PTYPE STATE AUTO SPEED-DUPLEX > PAUSE > e1000g0 current up yes 1G-f bi > e1000g1 current up yes 1G-f bi > e1000g2 current up yes 1G-f bi > e1000g3 current up yes 1G-f bi > > I'm still trying to understand what causes it. Perhaps dladm should have > better error reporting in case it doesn't get a satisfactory answer from > /dev/dld. > There seems to be a memory link in dlmgmtd since it's using 3.9GB of memory (12% of 32GB). Should I file a new bug ? If anyone is interested I can send the core dump. I also updated the file below with a dtrace output of the functions being called when you issue a "dladm show-ether" in another terminal on the same server (it's quite long). http://www.sysdroid.com/opensolaris/bugs/6908043.txt # ps aux | grep dlmgmt USER PID %CPU %MEM SZ RSS TT S START TIME COMMAND dladm 15 0.0 12.140393764039376 ? S Mar 30 6:47 /sbin/dlmgmtd # gcore 15 # ls -lh core.15 -rw-r--r-- 1 root root 3.9G 2010-05-03 22:17 core.15 # pstack 15 15: /sbin/dlmgmtd ----------------- lwp# 1 / thread# 1 -------------------- feef0547 pause () 08053a18 main (1, 8047e50, 8047e58, 8047e0c) + b8 0805326d _start (1, 8047ef0, 0, 8047efe, 8047f0e, 8047f1f) + 7d ----------------- lwp# 2 / thread# 2 -------------------- feef0ea1 door (fec9e980, 410, 0, fec9ee00, f5f00, a) 08054b98 dlmgmt_handler (0, fec9edd8, 28, 0, 0, 8054a9c) + fc feef0ed2 __door_return () + 52 ----------------- lwp# 3 / thread# 3 -------------------- feef0ea1 door (feb9f980, 410, 0, feb9fe00, f5f00, a) 08054b98 dlmgmt_handler (0, feb9fdd8, 28, 0, 0, 8054a9c) + fc feef0ed2 __door_return () + 52 ----------------- lwp# 4 / thread# 4 -------------------- feef0ea1 door (fea7e980, 410, 0, fea7ee00, f5f00, a) 08054b98 dlmgmt_handler (0, fea7edd8, 28, 0, 0, 8054a9c) + fc feef0ed2 __door_return () + 52 ----------------- lwp# 5 / thread# 5 -------------------- feef0ea1 door (fe80ed90, 18, 0, fe80ee00, f5f00, a) 08054b98 dlmgmt_handler (0, fe80ede8, 18, 0, 0, 8054a9c) + fc feef0ed2 __door_return () + 52 ----------------- lwp# 6 / thread# 6 -------------------- feef0ea1 door (fe70f980, 410, 0, fe70fe00, f5f00, a) 08054b98 dlmgmt_handler (0, fe70fdd8, 28, 0, 0, 8054a9c) + fc feef0ed2 __door_return () + 52 -- Giovanni
_______________________________________________ networking-discuss mailing list [email protected]
