Hark,

Not really, The system installed with this sc_manifest.xml.
I REMOVED the section for the name services AFTER the the configuration ran, and the system hung after it's reboot. I still have all the name services running (nis and dns) Somehow, the copied sc_manifest in /etc/svc/profile/site causes the system to hang, and dumping core files with every reboot.

This what happens:
1) install, sc_manifest from install copied to /etc/svc/profile/site.
2) system reboots, and the sc_manifest runs
3) reboot again
4) systems hangs
5) reboot from the net
6) remove the indicated section from the mounted zfs root fs
7) reboot
8) system is up with config in place (dns/nis and nsswitch.conf) in multi user, but still core files in "/"

Paul




On 10/26/2011 1:21 AM, Vit Hrachovy wrote:
Hi Paul,
so AI allowed you to install with manifest without specifying name service configuration? Well, this looks like AI problem that needs to be reported if it's not reported in Bugster already.
Cheers
Hark


On 10/25/11 07:13 PM, Paul de Nijs wrote:
Guys,

I think I found a problem.
When trying to diagnose this boot/reboot issue, I was wondering what
those core files were doing in in my root directory:

root@x4170-220:/# ls -ltr core*
-rw------- 1 root root 24518571 Oct 24 16:25 core.svc.configd.1319498713.13 -rw------- 1 root root 22427719 Oct 24 16:49 core.svc.configd.1319500140.13 -rw------- 1 root root 22737367 Oct 25 09:07 core.svc.configd.1319558870.13 -rw------- 1 root root 27015995 Oct 25 09:30 core.svc.configd.1319560253.13 -rw------- 1 root root 24498939 Oct 25 09:38 core.svc.configd.1319560711.13
root@x4170-220:/#

root@x4170-220:/# pstack core.svc.configd.1319560253.13
core 'core.svc.configd.1319560253.13' of 13: /lib/svc/bin/svc.configd
----------------- lwp# 1 / thread# 1 --------------------
feeb3e97 __sigtimedwait (8047e80, 0, 0, feea2307) + 7
feea231d sigwait (8047e80, 8047e80, 0, 8064760) + 25
08064788 main (1, 8047ed8, 8047ee0, 8047ecc) + 5e4
0805efdd _start (1, 8047f64, 0, 8047f7d, 8047f9a, 8047fb2) + 7d
----------------- lwp# 3 / thread# 3 --------------------
feeb4cc8 __door_unref (ffffffff, ffffffff, ff, 0, fedb1240, fef4d000) + 18
fee9b61b door_unref_func (d, fef4d000, fec6ffe8, feeb00ae) + 47
feeb0101 _thrp_setup (fedb1240) + 9d
feeb03a0 _lwp_start (fedb1240, 0, 0, 0, 0, 0)
----------------- lwp# 4 / thread# 4 --------------------
feeb03db __lwp_park (9030d4c, 80dc0a0) + b
feea9daf cond_wait_queue (9030d4c, 80dc0a0, 0, feeaa2d9) + 63
feeaa351 __cond_wait (9030d4c, 80dc0a0, fea2e9c8, feeaa399) + 89
feeaa3a7 cond_wait (9030d4c, 80dc0a0, 0, feeaa3df) + 27
feeaa3f4 pthread_cond_wait (9030d4c, 80dc0a0, fea2ea38, feea8d51) + 24
0807e465 rc_notify_info_wait (9030d08, 9030d5c, fea2eaa4, 2d0) + 131
0806701b client_wait (9030cc8, fea2edf8, 8, fea2eaa0, fea2ed90, 0) + 8f
08067c43 client_switcher (7b, fea2edf8, 8, 0, 0, 8067aa0) + 1a3
feeb4d1e __door_return () + 3e
----------------- lwp# 5 / thread# 5 --------------------
080a8b05 sqliteVdbeExec (906ad88, fea78000, fe92f768, 0) + fd
080a832e sqlite_step (906ad88, fe92f794, fe92f798, fe92f79c) + 4a
08095697 sqlite_exec (8f3a988, 9030e08, 805f270, 0, fe92f7fc, 1) + 9f
08062341 backend_tx_run (8f8c080, 92b3668, 805f270, 0) + 81
080684cd prop_lnk_tbl_delete (fe92f890, fe92f8b0, fe92f88c, 806cdc8) + 65
0806ce78 object_snapshot_attach (8629848, fe92f964, 0, 807a82d) + 144
0807a8d0 rc_attach_snapshot (8629848, 423, 0, fe92f990, fe92fc60,
fe92f98c) + 418
0807b0e7 rc_snapshot_attach (9580f3c) + f7
08066bb1 snapshot_attach (8fd83c8, fe92fdf4, 0, 0) + 45
08067878 simple_handler (8fd83c8, fe92fdf4, c, fe92fd88, fe92fd8c,
8066b6c) + 58
08067c43 client_switcher (7c, fe92fdf4, c, 0, 0, 8067aa0) + 1a3
feeb4d1e __door_return () + 3e
----------------- lwp# 6 / thread# 6 --------------------
feeb03db __lwp_park (fea81618, fea81638) + b
feea9daf cond_wait_queue (fea81618, fea81638, fed6ef38, feeaa001) + 63
feeaa1d7 cond_wait_common (fea81618, fea81638, fed6ef38, feeaa41d) + 1e7
feeaa4d1 __cond_timedwait (fea81618, fea81638, fed6efa8, feeaa509) + c5
feeaa51a cond_timedwait (fea81618, fea81638) + 2a
fea581eb umem_update_thread (0, fef4d000, fed6efe8, feeb00ae) + 18f
feeb0101 _thrp_setup (fedb0a40) + 9d
feeb03a0 _lwp_start (fedb0a40, 0, 0, 0, 0, 0)
----------------- lwp# 7 / thread# 7 --------------------
feeb4d01 __door_return (fe6ded90, 4, 0, 0) + 21
08067d4b client_switcher (21f, fe6dedfc, 4, 0, 0, 8067aa0) + 2ab
feeb4d1e __door_return () + 3e
----------------- lwp# 8 / thread# 8 --------------------
feeb4d01 __door_return (fe5dfd90, 4, 0, 0) + 21
08067d4b client_switcher (223, fe5dfdfc, 4, 0, 0, 8067aa0) + 2ab
feeb4d1e __door_return () + 3e
----------------- lwp# 9 / thread# 9 --------------------
feeb4d01 __door_return (fe4e0d88, 4, 0, 0) + 21
08067d4b client_switcher (90, fe4e0df4, c, 0, 0, 8067aa0) + 2ab
feeb4d1e __door_return () + 3e
----------------- lwp# 10 / thread# 10 --------------------
fee2a030 strlen (0, 80c1f98, 1, 0) + 30
08072cce map_granted_status (0, 8f16f08, fe3e18d0, 80777ec) + 26
0807792c rc_node_create_child_pg (955b2e4, 6, fe3e1d0c, fe3e1d84, 1,
89c0bdc) + 29c
080661cd entity_create_pg (8f2aa48, fe3e1cf8, 0, 0) + 89
08067878 simple_handler (8f2aa48, fe3e1cf8, 108, fe3e1c8c, fe3e1c90,
8066144) + 58
08067c43 client_switcher (54a, fe3e1cf8, 108, 0, 0, 8067aa0) + 1a3
feeb4d1e __door_return () + 3e
----------------- lwp# 11 / thread# 11 --------------------
feeb4d01 __door_return (fe2e2d90, 4, 0, 0) + 21
08067d4b client_switcher (2bf, fe2e2dfc, 4, 0, 0, 8067aa0) + 2ab
feeb4d1e __door_return () + 3e
----------------- lwp# 12 / thread# 12 --------------------
feeb4d01 __door_return (fe1e3d90, 4, 0, 0) + 21
08067d4b client_switcher (247, fe1e3dfc, 4, 0, 0, 8067aa0) + 2ab
feeb4d1e __door_return () + 3e
root@x4170-220:/#

root@x4170-220:/# strings core.svc.configd.1319560253.13
---- snip ----
CREATE TABLE pg_tbl (pg_id INTEGER PRIMARY KEY,pg_parent_id INTEGER NOT
NULL,pg_name CHAR(256) NOT NULL,pg_type CHAR(256) NOT NULL,pg_flags
INTEGER NOT NULL,pg_ge
table
prop_lnk_tbl
--- snip ----
/system/volatile/smf_integrity.db
strcmp(db_file, REPOSITORY_DB) == 0
Background integrity check failed.
Could not open %s for background integrity check.
out of memory running integrity check
PRAGMA integrity_check;
is in:

svc.configd: smf(5) database integrity check of:
%s%s
failed. The database might be damaged or a media error might have
prevented it from being verified. Additional information useful to
your service provider %s%s
The system will not be able to boot until you have restored a working
database. svc.startd(1M) will provide a sulogin(1M) prompt for recovery
purposes. The command:
/lib/svc/bin/restore_repository
can be run to restore a backup version of your repository. See
http://sun.com/msg/SMF-8000-MY for more information.
%s%s: integrity check failed.
%s%s: integrity check failed. Details in %s
: PRAGMA integrity_check; failed. Results:
(copied from %s)
/system/volatile/db_errors
Failed to check SMF repository version.
SMF repository upgrade failed.
/system/volatile
Failed to get SMF repository version: %s
Failed to get SMF repository version: %s
no such table: schema_version
Failed to open repository at %s: %s
/lib/svc/bin/repo_upgrade
fork in do_upgrade failed: %s
Failed repo_upgrade
Inheritable
Could not activate repo_upgrade process contract template: %s
Could not set process contract term: %s
repo_upgrade
Could not open process contract template: %s
/system/contract/process/template
Unable to open "%s". %s
Backend switch failed: sqlite_open %s: %s
Backend switch failed: integrity check %s: %s
Backend switch failed: strdup %s: %s
/system/volatile/fast_repository.db
/etc/svc/repository.db
SELECT schema_version FROM schema_version;
Backend copy failed: remove %s: %s
Backend copy failed: rename %s to %s: %s
Backend copy failed: incomplete copy
Backend copy failed: fstat %s: %s
Backend copy failed: opening %s: %s
Backend copy failed: mkstemp %s: %s
Backend copy failed: strlcat %s: overflow
sz < PATH_MAX
result == REP_PROTOCOL_SUCCESS
be == &be_info[backend_id]
MUTEX_HELD(&be->be_lock)
t == BACKEND_TYPE_NORMAL
be != NULL
t == BACKEND_TYPE_NORMAL || t == BACKEND_TYPE_NONPERSIST
unable to create "%s" backup of "%s"
reopening %s: %s
be == bes[BACKEND_TYPE_NORMAL]
"%s" backup completed, but removing old file "%s" failed: %s
"%s" backup completed, but updating "%s" symlink to "%s" failed: %s
"%s" backup failed: rename(%s, %s): %s
"%s" backup failed: mkstemp(%s): %s
"%s" backup failed: opening %s: %s
-%Y%m%d_%H%M%S
"%s" backup failed: localtime(3C) failed: %s
-tmpXXXXXX
boot
Backend copy failed: fails to write to %s at offset %d: %s
Backend copy failed: fails to read from %s at offset %d: %s
/system/volatile/checkpoint_repository.db
--- snip ---

So probably a problem with my smf database ??

What the heck, so I booted off the network, mounted the
'rpool/ROOT/solaris' under /tmp/lala
changed my sc_manifest (I only had one) that was used for installation
in /etc/svc/profile/site/profile_sc_manifest.xa8N_w.xml and uncommented
the RED part,

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE service_bundle SYSTEM
'/usr/share/lib/xml/dtd/service_bundle.dtd.1'>
<service_bundle name="system configuration" type="profile">
<service name="system/config-user" type="service" version="1">
<instance name="default" enabled="true">
<property_group name="root_account" type="application">
<propval name="password" type="astring" value="3jGcBJNJnPltU" />
<propval name="type" type="astring" value="role" />
</property_group>
<property_group name="user_account" type="application">
<propval name="login" type="astring" value="admin" />
<propval name="password" type="astring" value="3jGcBJNJnPltU" />
<propval name="description" type="astring" value="default user that is" />
<propval name="shell" type="astring" value="/bin/ksh" />
<propval name="uid" type="astring" value="101" />
<propval name="gid" type="astring" value="10" />
<propval name="type" type="astring" value="normal" />
<propval name="roles" type="astring" value="root" />
</property_group>
</instance>
</service>
<service name="system/console-login" type="service" version="1">
<property_group name="ttymon" type="application">
<propval name="terminal_type" type="astring" value="vt100" />
</property_group>
</service>
<service name="system/keymap" type="service" version="1">
<instance name="default" enabled="true">
<property_group name="keymap" type="system">
<propval name="layout" type="astring" value="US-English" />
</property_group>
</instance>
</service>
<service name="system/identity" version="1">
<instance name="node" enabled="true">
<property_group name="config">
<propval name="nodename" value="x4170-220" />
</property_group>
</instance>
</service>
<service name="system/timezone" version="1">
<instance name="default" enabled="true">
<property_group name="timezone">
<propval name="localtime" value="US/Pacific" />
</property_group>
</instance>
</service>
<service name="network/physical" version="1">
<instance name="default" enabled="true">
<property_group name="netcfg" type="application">
<propval name="active_ncp" type="astring" value="DefaultFixed" />
</property_group>
</instance>
</service>
<service name="network/install" type="service" version="1">
<instance name="default" enabled="true">
<property_group name="install_ipv4_interface" type="application">
<propval name="name" type="astring" value="net0/v4" />
<propval name="address_type" type="astring" value="static" />
<propval name="static_address" type="net_address_v4"
value="10.137.234.114/21" />
<propval name="default_route" type="net_address_v4" value="10.137.232.1" />
</property_group>
<property_group name="install_ipv6_interface" type="application">
<propval name="name" type="astring" value="net0/v6" />
<propval name="address_type" type="astring" value="addrconf" />
<propval name="stateless" type="astring" value="yes" />
<propval name="statefull" type="astring" value="yes" />
</property_group>
</instance>
</service>
<!--
<service name="network/nis/domain" version="1">
<instance name="default" enabled="true" />
<property_group name="config" type="application">
<property name="ypservers" type="net_address">
<net_address_list>
<value_node value="10.137.232.10" />
<value_node value="10.137.232.15" />
<value_node value="10.137.232.16" />
<value_node value="10.137.232.30" />
<value_node value="10.196.232.10" />
<value_node value="10.196.231.15" />
<value_node value="10.196.231.16" />
<value_node value="10.196.231.30" />
</net_address_list>
</property>
<propval name="domainname" type="hostname" value="SAE.West.Sun.COM" />
</property_group>
</service>
<service name="network/nis/client" type="service" version="1">
<instance name="default" enabled="true" />
</service>
<service name="network/dns/client" version="1">
<instance name="default" enabled="true" />
<property_group name="config">
<property name="domain" type="astring">
<astring_list>
<value_node value="saelab.us.oracle.com" />
</astring_list>
</property>
<property name="options" type="astring">
<astring_list>
<value_node value="ndots:2" />
</astring_list>
</property>
<property name="nameserver" type="net_address">
<net_address_list>
<value_node value="10.137.232.10" />
<value_node value="10.137.232.15" />
<value_node value="10.137.232.16" />
</net_address_list>
</property>
<property name="search" type="astring">
<astring_list>
<value_node value="saelab.us.oracle.com" />
<value_node value="us.oracle.com" />
<value_node value="oracle.com" />
<value_node value="oraclecorp.com" />
<value_node value="west.sun.com" />
<value_node value="sun.com" />
</astring_list>
</property>
</property_group>
</service>
<service name="system/name-service/cache" type="service" version="1">
<instance name="default" enabled="true" />
</service>
<service name="system/name-service/switch" type="service" version="1">
<instance name="default" enabled="true" />
<property_group name="config" type="application">
<propval name="default" type="astring" value="files nis" />
<propval name="host" type="astring" value="files nis dns" />
<propval name="password" type="astring" value="files nis" />
<propval name="group" type="astring" value="files nis" />
<propval name="network" type="astring" value="files nis" />
<propval name="netmask" type="astring" value="files nis" />
<propval name="automount" type="astring" value="files nis" />
<propval name="alias" type="astring" value="files nis" />
<propval name="printer" type="astring" value="user nis" />
<propval name="auth_attr" type="astring" value="user nis" />
<propval name="prof_attr" type="astring" value="user nis" />
</property_group>
</service>
-->
</service_bundle>

Reboot, and it comes up

BTW, this snapshot is taken from the booted system I have problems with ....
I still have a coredump though:

root@x4170-220:/# ls -ltr /core*
-rw------- 1 root root 24518571 Oct 24 16:25 /core.svc.configd.1319498713.13 -rw------- 1 root root 22427719 Oct 24 16:49 /core.svc.configd.1319500140.13 -rw------- 1 root root 22737367 Oct 25 09:07 /core.svc.configd.1319558870.13 -rw------- 1 root root 27015995 Oct 25 09:30 /core.svc.configd.1319560253.13 -rw------- 1 root root 24498939 Oct 25 09:38 /core.svc.configd.1319560711.13
root@x4170-220:/# who -r
. run-level 3 Oct 25 09:38 3 0 S
root@x4170-220:/#

Go figure, is this an AI problem or a solaris problem ?

Thanks

Paul





On 10/25/11 01:20 AM, Vit Hrachovy wrote:
Hi Paul,
we're regularly testing on x4170m2 and have not seen described issues
with snv_175a/b so far. We don't install OEL on the second disk, though.

Does SP/ILOM report any HW issue? Is it able to boot OEL successfully?
What kind of S11 install did you use (media GUI|textinstall, networked
AI or textinstall)?

You may also add -k to grub options to be able to drop to KMDB for
further inspection of the hanged system.

If you're able to provide the verbose messages output to the console
prior and up to the hang, it could be helpful.

Regards
Hark
Solaris System Test

On 10/24/11 07:43 PM, Paul de Nijs wrote:
Guys,

Just wondering if this is a known problem....

Installed a fresh system (x4170M2) with build 175a. Because we saw some
problems with a boot 'hang' after the console displays the 'Hostname:
.....' we actually disabled fastboot, assuming there was a problem with
it, also because we installed OEL6.1 on the second disk, so we always
get the solaris grub menu to choose between solaris and OEL.

Last Friday we powered the system down (I think we were in Linux then).
Anyway, it was a clean shutdown (common practice)

This morning I powered the system back up, and just because we have seen this boot issue on other systems, I wanted to see what messages i would
get with the '-sv' option in the grub menu.
It spits out the stuff you expect, but again, it 'hangs' after
displaying "Hostname: ..."
No messages after that.

Note that we have the console on /dev/ttya (linked to /dev/console I
expect). There is nothing on the JavaRConsole.

We are kind of stuck here, since I don't even know where to look to fix this. A boot from the network, mounting the root zfs and a quick glance
at /var/adm/messages doesn't reveal anything.

BTW, we have seen this problem on SPARC (T4-4) as well, the workaround
there is boot in single user mode from the OBP prompt and then a simple
'exit' to go to multi-user. That seems to work until now.

If someone wants to take a look, please contact me

Thanks

Paul





--



Paul de Nijs | Principal Software Engineer | Performance Technologies |
+1.503.495.7882
Oracle Strategic Applications Engineering (SAE)
3295 NW 211th Terrace | Hillsboro, OR 97124-7110

--



Paul de Nijs | Principal Software Engineer | Performance Technologies |
+1.503.495.7882
Oracle Strategic Applications Engineering (SAE)
3295 NW 211th Terrace | Hillsboro, OR 97124-7110



_______________________________________________
caiman-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/caiman-discuss

Reply via email to