Bug#688521: SILO first boot after power-on or reset fails on Netra T1 200

2012-09-26 Thread Mark Morgan Lloyd
All the drives I've tested are 9Gb SCA. Of the discs available to me 
(WD, Seagate, Fujitsu, IBM) no disc that has been branded Compaq works 
adequately, while discs that have been branded Sun or are generic (e.g. 
straight from Seagate) are OK. I have never seen this problem on any 
other systems, Sun or otherwise.


My suspicion is that the variant/version of OBP on these machines 
recognises that a disc has been badged by an OEM, assumes that it's Sun 
without checking in further detail, and tries to do something Sun-specific.


I think this issue should probably be closed since I don't think it's a 
Linux or SILO problem. The most that can be done is make sure a warning 
is available in an appropriate place in platform-specific notes.


--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#688521: SILO first boot after power-on or reset fails on Netra T1 200

2012-09-25 Thread Mark Morgan Lloyd
This is something to do with the type of disc being used, i.e. the exact 
firmware version or similar, rather than the Debian release or the 
version of SILO etc.


I'm using non-Sun SCA discs, badged Compaq but apparently WD. Some work 
OK while others- nominally the same but with visibly different PCB- fail 
the first boot.


This might still be specific to OBP in a Netra T1, both types of disc 
boot first time in an Ultra-1. It affects Lenny, Squeeze and Wheezy, and 
possibly other OSes.


--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#688521: SILO first boot after power-on or reset fails on Netra T1 200

2012-09-24 Thread Mark Morgan Lloyd

 Looking at the output you see, I have doubts that it has anything
 to do with SILO though. SILO prints letters 'S', 'I', 'L' and 'O'
 (appearing before the prompt) after it completes execution of
 different parts of first-stage loader. As you can see in the code
 (first/first.S), printing 'S' is the first thing first-stage loader
 does upon startup. The fact that it is not seen in the console output
 suggests that even first-stage loader never got to run. The line

 Boot device: /pci@1f,0/pci@1/scsi@8/disk@0,0:a  File and args:

 which is normally printed by OBP before control is passed to SILO
 does not appear in the watchdog-reset case either, which, again,
 is a strong sign that failure happens before SILO has a chance to run.

OK, but it still boots Squeeze without complaint. And complains when 
booting Lenny.


 In a failure case, how long does it take between you typing 'boot' and
 watchdog reset message being displayed?

About a second.

 This doc
http://docs.oracle.com/cd/E19102-01/n240.srvr/817-5481-11/understanding_wdtimer.html

 appears to suggest that stuck watchdog would initiate a XIR after 60
 seconds by default, is it consistent with what you see? What are the
 values of various variables mentioned there on your system(s)? Does
 increasing the timeout help?

As far as I can see that's applicable to Solaris and ALOM. The T1 200 
uses the lomlite2 chip.


 I really can't come up with any reason why it would work for Squeeze
 but not other releases, so testing all suspect SILO versions on the
 same machine would be an interesting experiment.

Working backwards using silo_1.4.14+git20120819-1_sparc.deb 
silo_1.4.14+git20100228-1+b1_sparc.deb 
silo_1.4.13a+git20070930-3_sparc.deb silo_1.4.13-1_sparc.deb resulted in 
no change in symptoms. Trying to use silo_1.4.9-1_sparc.deb resulted in 
a system which dumped me straight into BusyBox. Putting the Squeeze disc 
back into the system at that point still worked without complaint.


In case I was doing anything obviously wrong, I was getting the .deb 
using wget and then installing using  dpkg -i


I take Richard's point about it not being caused directly by the LOM 
chip (nothing in its log). The fact that Squeeze (still) works suggests 
that OBP and its variables including nvramrc aren't directly involved. I 
take your point about SILO not being displayed.


Observation (manual transcript follows):

# Booting Squeeze:

OpenBoot 4.0 [...] Ethernet address [...]
ok boot
Bad magic number in disk label
Can't open disk label package
Boot device: disk  File and args:
SILO version 1.4.14
Boot:

# i.e. that works without complaint. Booting Wheezy:

OpenBoot 4.0 [...] Ethernet address [...]
ok boot
[Hex dump here]
Watchdog Reset
Externally Initiated Reset
ok boot
Boot device: /pci@1f,0/pci@1/scsi@8/disk@0,0:a  File and args:
SILO Version 1.4.[...]
boot:

I need to go back and check Lenny (which fails to boot) again, that 
'Can't open disk label package' message might be significant.



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#688521: SILO first boot after power-on or reset fails on Netra T1 200

2012-09-23 Thread Mark Morgan Lloyd

Package: silo
Version: 1.4.14+git20120819-1
Severity: normal

On a Netra T1 200 with default installation (no desktop packages etc.) 
and with a serial terminal attached to the LOM port as console, the 
first boot command after power-on, reset or reboot fails with a watchdog 
timeout before SILO presents its boot prompt:


lom
lomversion
LOM version:v3.10
LOM checksum:   a068
LOM firmware part#  258-7871-16
Microcontroller:H8/3437S
LOM firmware build  Apr  3 2001 13:04:44
lompoweron
lom
LOM event: +4h4m40s host power on
Netra T1 200 (UltraSPARC-IIe 500MHz), No Keyboard
OpenBoot 4.0, 1024 MB memory installed, Serial #51358633.
Ethernet address 0:3:ba:f:ab:a9, Host ID: 830faba9.

ok boot
0015f000f5b8
000d009730a100d61b8ffe9d00050010
f0004200f000420400040010f0004200f000420400030010
f0004200f0004204000200100070807080040001013e
f000a860f000a864
Watchdog Reset
Externally Initiated Reset
ok boot
Boot device: /pci@1f,0/pci@1/scsi@8/disk@0,0:a  File and args:
SILO Version 1.4.14
boot:
Allocated 64 Megs of memory at 0x4000 for kernel
Uncompressing image...
[Successful boot here.]


Lomlite2 firmware is at 3.10, there are no watchdog entries in its log. 
LOM firmware can't be upgraded without using a non-free Solaris package.


This affects SILO on Wheezy and Lenny, but Squeeze boots despite having 
the same version as Wheezy:


Lenny:   1.4.13
Squeeze: 1.4.14
Wheezy:  1.4.14

Kernel and configuration as shipped on CD. Behavior is predictable, and 
the same over several computers of the same model. I have not tried 
building any custom kernels, so can't say whether it depends crucially 
on e.g. the size or number of the kernels in silo.conf.


-- System Information:
Debian Release: wheezy/sid
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: sparc (sparc64)

Kernel: Linux 3.2.0-3-sparc64
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages silo depends on:
ii  libc6  2.13-35

silo recommends no packages.

silo suggests no packages.

-- Configuration Files:
/etc/silo.conf changed:
root=/dev/sda2
partition=1
default=Linux
read-only
timeout=100
image=/vmlinuz
label=Linux
initrd=/initrd.img
image=/vmlinuz.old
label=LinuxOLD
initrd=/initrd.img.old


-- no debconf information


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org