[Redirecting to nwam-discuss with permission of the original sender.]

Dave Miner writes:
> This is just a courtesy notice that you guys are now the proud(?) owners 
> of one OpenSolaris release blocker.  Your attention will be appreciated.
[...]
> Some basic investigation and discussion indicates that the svcprop error
> message shown in the log occurs on every non-reconfiguration boot, so it's a
> red herring.  The problem here appears to be nwamd exiting for no apparent
> reason.  Reassigning to networking, marking as release blocker.

I've spent some time with this, and wanted to let everyone else know
the current story.  I'll start looking at it again tomorrow, unless
someone else makes a break in the case.

I was able to replicate the problem when booting from CD-ROM and from
USB on a Tecra A8.

I tried booting up to "-m milestone=none", mounting the root file
system with /lib/svc/method/live-fs-root, setting up /etc/syslog.conf
and running /usr/sbin/syslogd, and then running nwamd manually.  It
worked fine; I couldn't reproduce the problem that way.

I tried modifying the USB boot environment to enable debug.  That also
booted just fine; debug seems to touch the timing just enough to stop
it from failing.

Note to others (in case you want to do this sort of hacking) -- here's
what I did to enable debug with the USB stick mounted on a regular
system:

        gunzip < /media/*/boot/boot_archive > /tmp/arch
        lofiadm -a /tmp/arch
        mkdir /tmp/foo
        mount -F ufs /dev/lofs/1 /tmp/foo
        env SVCCFG_REPOSITORY=/tmp/foo/etc/svc/repository.db svccfg -s nwam
        setprop nwamd/debug=true
        refresh
        exit
        vi /tmp/foo/etc/syslog.conf
        (change daemon.notice to daemon.debug)
        vi /tmp/foo/lib/svc/method/net-nwam
        (add "touch /var/adm/messages ; /usr/sbin/syslogd" before
        invocation of /lib/inet/nwamd)
        umount /tmp/foo
        lofiadm -d /def/lofs/1
        gzip -9 < /tmp/arch > /media/*/boot/boot_archive

Then unmount the stick and reboot.

Renee mentioned in private email that it seems dladm_open() is what's
failing, which can only mean that the DLD control node is missing for
some reason.

-- 
James Carlson, Solaris Networking              <james.d.carlson at sun.com>
Sun Microsystems / 35 Network Drive        71.232W   Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442 1677

Reply via email to