On 08/19/09 14:25, Sebastien Roy wrote:
On Wed, 2009-08-19 at 13:56 -0400, Steffen Weiberle wrote:
I am running build 121 SX-CE on SPARC on system with four bge
interfaces. I was doing some testing of how many VNIC I can configure,
and in the process tried to 'get rid' of them by replacing the existing
/etc/dladm/datalink.conf file with an original version.
That's not a supported way of doing any administrative operation on
datalinks (/etc/dladm/datalink.conf is a private interface of the
dlmgmtd daemon).
Understood. The initial problem started when I had created N VNICs and
the system hung and would not reboot (at least not quickly enough that I
chose to reboot into some other way and replace the datalink.conf file
manually). The problem seems to be that for each VNIC/VLAN, six kernel
threads are created and the system is running out of a resource.
Anyway, the original came from the exact same build of OpenSolaris,
right? The permissions of the file are correct (644 dladm:sys)? You
rebooted after performing this surgery (dlmgmtd only loads the file once
and keeps a cache of its contents in memory)?
No, no, and yes, although after some other surgery, I was able to verify
the *content* of file with what looked like an original file. I did not
pay close attention to the ownership, and now will!
Ever since them I am experiencing intermittent failures with dladm.
Nothing seems completely reproducable.
Just now I deleted a whole bunch of VLANs (4094 of them), and can see
that via dladm show-link. However,
# wc -l /etc/dladm/datalink.conf
4127 /etc/dladm/datalink.conf
suggests otherwise. One symptom is
# dladm rename-link bge3 nic1
dladm: rename operation failed: permission denied
which might be because all the VNICs still on bge3 (although I would
have thought that would not make a difference--to rename the underlying
data link).
Did you run truss on dlmgmtd to figure out why it's coming back with
EPERM?
Nope, did not know about it. I did truss dladm operation, and now see
the reason for the door calls!
As I also have S10 installed on the system, part of me wonders whether
fast reboot (init 6) into build 121 does not clear things out, whereas a
reboot into S10 does. (A quick boot -s into S10 5/09 and then back into
121 seems to have the system boot delay a 'long' time between "Hardware
watchdog enable" and "Hostname...", something that *seems* to go faster
when doing just an init 6 within 121.
Certainly not related.
After this boot cycle, removing the 4094 VLANs still have datalink.conf
contain all the entries.
I thought you said that you replaced the datalink.conf file... I'm
confused.
So am I as I have tried a number of different operations to see how to
recover, short of re-installing. The ownership part may be key.
Is this a bug?
You're already doing something that is unsupportable by manually
manipulating the datalink.conf file, so who knows.
Understood, and done only due to the situation above.
Thanks!
-Seb
_______________________________________________
networking-discuss mailing list
[email protected]