On 08/19/09 14:25, Sebastien Roy wrote:
On Wed, 2009-08-19 at 13:56 -0400, Steffen Weiberle wrote:
I am running build 121 SX-CE on SPARC on system with four bge interfaces. I was doing some testing of how many VNIC I can configure, and in the process tried to 'get rid' of them by replacing the existing /etc/dladm/datalink.conf file with an original version.

That's not a supported way of doing any administrative operation on
datalinks (/etc/dladm/datalink.conf is a private interface of the
dlmgmtd daemon).

Understood. The initial problem started when I had created N VNICs and the system hung and would not reboot (at least not quickly enough that I chose to reboot into some other way and replace the datalink.conf file manually). The problem seems to be that for each VNIC/VLAN, six kernel threads are created and the system is running out of a resource.

Anyway, the original came from the exact same build of OpenSolaris,
right?  The permissions of the file are correct (644 dladm:sys)?  You
rebooted after performing this surgery (dlmgmtd only loads the file once
and keeps a cache of its contents in memory)?

No, no, and yes, although after some other surgery, I was able to verify the *content* of file with what looked like an original file. I did not pay close attention to the ownership, and now will!

Ever since them I am experiencing intermittent failures with dladm.

Nothing seems completely reproducable.

Just now I deleted a whole bunch of VLANs (4094 of them), and can see that via dladm show-link. However,

# wc -l /etc/dladm/datalink.conf
     4127 /etc/dladm/datalink.conf

suggests otherwise. One symptom is

# dladm rename-link bge3 nic1
dladm: rename operation failed: permission denied

which might be because all the VNICs still on bge3 (although I would have thought that would not make a difference--to rename the underlying data link).

Did you run truss on dlmgmtd to figure out why it's coming back with
EPERM?

Nope, did not know about it. I did truss dladm operation, and now see the reason for the door calls!

As I also have S10 installed on the system, part of me wonders whether fast reboot (init 6) into build 121 does not clear things out, whereas a reboot into S10 does. (A quick boot -s into S10 5/09 and then back into 121 seems to have the system boot delay a 'long' time between "Hardware watchdog enable" and "Hostname...", something that *seems* to go faster when doing just an init 6 within 121.

Certainly not related.

After this boot cycle, removing the 4094 VLANs still have datalink.conf contain all the entries.

I thought you said that you replaced the datalink.conf file...  I'm
confused.

So am I as I have tried a number of different operations to see how to recover, short of re-installing. The ownership part may be key.

Is this a bug?

You're already doing something that is unsupportable by manually
manipulating the datalink.conf file, so who knows.

Understood, and done only due to the situation above.

Thanks!


-Seb



_______________________________________________
networking-discuss mailing list
[email protected]

Reply via email to