On Mon, May 3, 2010 at 10:11 PM, Giovanni Tirloni
<[email protected] <mailto:[email protected]>> wrote:
On Sun, May 2, 2010 at 10:56 PM, Giovanni Tirloni
<[email protected] <mailto:[email protected]>> wrote:
Hello,
I think we've been hit by bug 6908043, dladm show-ether
stopped showing any interface at all.
All details added to
http://www.sysdroid.com/opensolaris/bugs/6908043.txt
The only recent changes that we had were moving LACP from
"short" to "long" on aggr0 (e1000g1+e1000g2) and we started
using "dladm show-ether" on Zabbix to monitor the interface
status since a few days ago. So I don't know if it was
happening before and we never noticed or if heavy use of dladm
show-ether is triggering the problem.
Turned out dlmgmtd(1M) was stuck and had to be restarted:
# svcadm restart datalink-management
# dladm show-ether
LINK PTYPE STATE AUTO
SPEED-DUPLEX PAUSE
e1000g0 current up yes
1G-f bi
e1000g1 current up yes
1G-f bi
e1000g2 current up yes
1G-f bi
e1000g3 current up yes
1G-f bi
I'm still trying to understand what causes it. Perhaps dladm
should have better error reporting in case it doesn't get a
satisfactory answer from /dev/dld.
There seems to be a memory link in dlmgmtd since it's using 3.9GB of
memory (12% of 32GB).
Should I file a new bug ? If anyone is interested I can send the core
dump.
I also updated the file below with a dtrace output of the functions
being called when you issue a "dladm show-ether" in another terminal
on the same server (it's quite long).
http://www.sysdroid.com/opensolaris/bugs/6908043.txt
# ps aux | grep dlmgmt
USER PID %CPU %MEM SZ RSS TT S START TIME COMMAND
dladm 15 0.0 12.140393764039376 ? S Mar 30 6:47
/sbin/dlmgmtd
# gcore 15
# ls -lh core.15
-rw-r--r-- 1 root root 3.9G 2010-05-03 22:17 core.15
# pstack 15
15: /sbin/dlmgmtd
----------------- lwp# 1 / thread# 1 --------------------
feef0547 pause ()
08053a18 main (1, 8047e50, 8047e58, 8047e0c) + b8
0805326d _start (1, 8047ef0, 0, 8047efe, 8047f0e, 8047f1f) + 7d
----------------- lwp# 2 / thread# 2 --------------------
feef0ea1 door (fec9e980, 410, 0, fec9ee00, f5f00, a)
08054b98 dlmgmt_handler (0, fec9edd8, 28, 0, 0, 8054a9c) + fc
feef0ed2 __door_return () + 52
----------------- lwp# 3 / thread# 3 --------------------
feef0ea1 door (feb9f980, 410, 0, feb9fe00, f5f00, a)
08054b98 dlmgmt_handler (0, feb9fdd8, 28, 0, 0, 8054a9c) + fc
feef0ed2 __door_return () + 52
----------------- lwp# 4 / thread# 4 --------------------
feef0ea1 door (fea7e980, 410, 0, fea7ee00, f5f00, a)
08054b98 dlmgmt_handler (0, fea7edd8, 28, 0, 0, 8054a9c) + fc
feef0ed2 __door_return () + 52
----------------- lwp# 5 / thread# 5 --------------------
feef0ea1 door (fe80ed90, 18, 0, fe80ee00, f5f00, a)
08054b98 dlmgmt_handler (0, fe80ede8, 18, 0, 0, 8054a9c) + fc
feef0ed2 __door_return () + 52
----------------- lwp# 6 / thread# 6 --------------------
feef0ea1 door (fe70f980, 410, 0, fe70fe00, f5f00, a)
08054b98 dlmgmt_handler (0, fe70fdd8, 28, 0, 0, 8054a9c) + fc
feef0ed2 __door_return () + 52
--
Giovanni
------------------------------------------------------------------------
_______________________________________________
networking-discuss mailing list
[email protected]