debian-ports installer

2012-09-23 Thread Frans van Berckel
Did anyone try building a debian-ports-sparc64 installer image yet? Or do
you know the status, and what still got to be done, to build one?

Thanks,


Frans van Berckel


-- 
To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/02dfa5e56c617efbaaf6ae12bb4595e3.squir...@webmail.xs4all.nl



Problems with Debian in LDOM

2012-09-23 Thread Jurij Smakov
Hello,

I recently got access to a T1000 machine running Solaris 10 with LDOMs 
version 1.2, and tried to test the latest Debian installer (Wheezy 
beta 2) by installing it into an LDOM. The installation appears to 
work mostly fine (we have a minor bug preventing automated loading of 
sunvnet and sunvdc modules), but after the installation finishes, the 
data on virtual disk (backed, in my case, by a file created with 
mkfile) is randomly corrupted, so on reboot the root file systems 
comes up in read-only mode, random binaries/scripts fail to execute, 
etc.

By now I tried the installation 3 times, first two were using ext4 as 
root/home filesystems. In these cases I was able to complete the 
installation, but the system was unusable after the reboot due to data 
corruption on disk. I then tried using ext2 only, thinking that it 
might be ext4 issue, and was not even able to complete the 
installation due to disk errors. Relevant portion of dmesg output 
saved from this install run:

[0.00] PROMLIB: Sun IEEE Boot Prom 'OBP 4.28.1 2008/02/11 13:04'
[0.00] PROMLIB: Root node compatible: sun4v
[0.00] Initializing cgroup subsys cpuset
[0.00] Initializing cgroup subsys cpu
[0.00] Linux version 3.2.0-3-sparc64 (Debian 3.2.23-1) 
(debian-ker...@lists.debian.org) (gcc version 4.6.3 (Debian 4.6.3-8) ) #1 Mon 
Jul 23 03:37:35 UTC 2012
[...]
[ 6935.913467] sunvnet.c:v1.0 (June 25, 2007)
[ 6935.917982] eth0: Sun LDOM vnet 00:14:4f:f8:e2:6f
[ 6935.919764] sunvnet: eth0: PORT ( remote-mac 00:14:4f:f8:27:09 switch-port )
[ 7569.263779] sunvdc.c:v1.0 (June 25, 2007)
[ 7569.265299] sunvdc: vdiska: 41720805 sectors (20371 MB)
[ 7569.267844]  vdiska: vdiska1 vdiska2 vdiska3 vdiska4 vdiska5
[...]
[ 8659.183062] sunvdc: ldc_map_sg() failure, err=-40.
[ 8659.183083] end_request: I/O error, dev vdiska, sector 10969060
[ 8659.778350] sunvdc: ldc_map_sg() failure, err=-40.
[ 8659.778371] end_request: I/O error, dev vdiska, sector 10981396
[ 8659.857056] sunvdc: ldc_map_sg() failure, err=-40.
[...]
[ 8660.515942] Buffer I/O error on device vdiska2, logical block 1349865
[ 8660.515958] lost page write due to I/O error on vdiska2
[...]

Error messages like this logged in dmesg multiple times, until 
installer fails and is aborted. The complete dmesg output is available 
at http://www.wooyd.org/tmp/dmesg_ldom.log.

Questions: does anyone have properly working setup with Debian (or 
other Linux) in an LDOM? Should my particular setup (combination of 
Solaris/LDOM/kernel version) be supported? Is there anything in the 
LDOM setup I might need to tweak to make it work?

Best regards,
-- 
Jurij Smakov   ju...@wooyd.org
Key: http://www.wooyd.org/pgpkey/  KeyID: C99E03CC


-- 
To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120923103338.ga6...@wooyd.org



Re: Netra T1 200 watchdog timeouts

2012-09-23 Thread Jurij Smakov
On Sat, Sep 22, 2012 at 12:26:26PM +0100, Richard Mortimer wrote:
 
 
 On 19/09/2012 13:10, Mark Morgan Lloyd wrote:
 Richard Mortimer wrote:
 On 18/09/2012 18:49, Mark Morgan Lloyd wrote:
 Richard Mortimer wrote:
 ... snip ...
 This affects both Lenny and Wheezy but does not affect Squeeze,
 i.e. it
 appears to be a regression. Since this happens in between the OBP boot
 command and SILO's boot prompt, I presume that it is a SILO problem or
 that the installer is doing something odd to the disklabel.
 
 Lenny:1.4.13
 Squeeze: 1.4.14
 Wheezy:1.4.14
 
 I don't see how the LOM firmware would affect this. OBP maybe but if
 it is a processor watchdog then it I doubt its LOM. SILO would be my
 first suspect.
 
 SILO is also my suspect (after a lot of fiddling trying to disable lom
 watchdog from OBP etc.) and those are SILO version numbers :-/
 
 Brain wasn't turned on enough to realise that!
 
  From memory I don't think the LOM watchdog is ever enabled in OBP on
 the T1 200. It only ever gets enabled by the device drivers once
 Solaris is running (if the packages you mention below are installed of
 course).
 
 OK but at the same time the README from Solaris patch 110208-21
 explicitly says
 
 5043823  Patch 110208-18 changes watchdog behavior and causes watchdog
 resets when probed
 
 and
 
 4412177  lomlite2 watchdog is not always disabled on reboot - 110208-07
 
 both of which read as though there could be spurious watchdog events
 even without Solaris's intervention. However I note your point about the
 LOM log not showing anything.
 I'm still pretty convinced that the problem you are seeing is
 nothing to do with LOM. I think that both of those are Solaris
 device driver issues too.
 
 
 Should I be raising this as a bug, or can I assume that the people who
 need to know about it are already aware of the issue?
 Given that this affects Wheezy then a Debian bug is certainly in order.
 
 I haven't had time to track the development of Wheezy closely but I
 think that it is pretty much using upstream SILO. I vaguely remember
 a few changes upstream recently for both ext2/4 support and for cpu
 detection. One of those could be causing your problem on the Wheezy
 build.

Well, Mark mentioned that the same issue is encountered in both Wheezy 
and Squeeze SILO versions, which predates the recent ext2/4 changes.

And yes, there haven't been any Debian-specific changes to upstream 
SILO as of version 1.4.14+git20100228-1, uploaded in February 2010. 
Before that we had some Debian-specific patches included.

Mark, if you can try different SILO versions and find out which one 
introduced the regression, that would be great. As far as I can tell, 
releases shipped with the following versions:

Lenny  : 1.4.13a+git20070930-3
Squeeze: 1.4.14+git20100228-1+b1 

Assuming that the failure was introduced between 1.4.13a+git20070930-3 
(Lenny version) and 1.4.14+git20100228-1+b1 (Squeeze version), you 
just have one intermediate version (1.4.14+git20100207-1) to test.

 Given the nature of the problem I think it would be useful to have a
 good description of your installation in the bug. In particular
 filesystem layout (partition table), type (ext2/3/4) etc. may be
 relevant. A copy of the console session would be good to attach too.

Yep, the bug would be useful. Given that it's the first report like 
this that I see and that a simple enough workaround exists, I would 
don't think it qualifies as RC.

Best regards,
-- 
Jurij Smakov   ju...@wooyd.org
Key: http://www.wooyd.org/pgpkey/  KeyID: C99E03CC


-- 
To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120923105458.ga6...@wooyd.org



Re: Netra T1 200 watchdog timeouts

2012-09-23 Thread Mark Morgan Lloyd

Jurij Smakov wrote:


Should I be raising this as a bug, or can I assume that the people who
need to know about it are already aware of the issue?

Given that this affects Wheezy then a Debian bug is certainly in order.


It went in as 688521 at about the same time as you posted. Pity I didn't 
hold off for another hour or so.



I haven't had time to track the development of Wheezy closely but I
think that it is pretty much using upstream SILO. I vaguely remember
a few changes upstream recently for both ext2/4 support and for cpu
detection. One of those could be causing your problem on the Wheezy
build.


Well, Mark mentioned that the same issue is encountered in both Wheezy 
and Squeeze SILO versions, which predates the recent ext2/4 changes.


Wheezy and Lenny, but not Squeeze. Which is odd in view of the 
(upstream) version numbers, and suggests that it's either something very 
specific to the distro version (e.g. kernel length) or is caused by the 
precise version of the compiler.


And yes, there haven't been any Debian-specific changes to upstream 
SILO as of version 1.4.14+git20100228-1, uploaded in February 2010. 
Before that we had some Debian-specific patches included.


Mark, if you can try different SILO versions and find out which one 
introduced the regression, that would be great. As far as I can tell, 
releases shipped with the following versions:


Lenny  : 1.4.13a+git20070930-3
Squeeze: 1.4.14+git20100228-1+b1 

Assuming that the failure was introduced between 1.4.13a+git20070930-3 
(Lenny version) and 1.4.14+git20100228-1+b1 (Squeeze version), you 
just have one intermediate version (1.4.14+git20100207-1) to test.


This is something I've not had to do before- Debian usually just works 
or I have to go upstream if I want something bleeding-edge. Is this 
syntax right and in view of the message what should I have in 
sources.list etc?


root@firewall3:/home/markMLl# apt-get install silo=1.4.14+git20100228-1+b1
..
E: Version '1.4.14+git20100228-1+b1' for 'silo' was not found


Given the nature of the problem I think it would be useful to have a
good description of your installation in the bug. In particular
filesystem layout (partition table), type (ext2/3/4) etc. may be
relevant. A copy of the console session would be good to attach too.


Yep, the bug would be useful. Given that it's the first report like 
this that I see and that a simple enough workaround exists, I would 
don't think it qualifies as RC.


The problem here is that it mandates having a terminal attached to a 
headless system since auto-boot doesn't work. Setting auto-boot-retry? 
true doesn't help.


I'd have reported this earlier if I'd not spent three weeks messing 
around with the OBP's lom@ and lom!, potential ways of getting at it 
from Linux and looking for the Solaris LOM package :-/


--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]


--
To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/k3n53i$iss$1...@pye-srv-01.telemetry.co.uk



Re: Problems with Debian in LDOM

2012-09-23 Thread David Miller
From: Jurij Smakov ju...@wooyd.org
Date: Sun, 23 Sep 2012 11:33:38 +0100

 Questions: does anyone have properly working setup with Debian (or 
 other Linux) in an LDOM? Should my particular setup (combination of 
 Solaris/LDOM/kernel version) be supported? Is there anything in the 
 LDOM setup I might need to tweak to make it work?

I haven't tested LDOM in at least 2 years, it's very likely
bugs have crept in.


-- 
To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/20120923.121440.1491137498723341084.da...@davemloft.net



Re: Netra T1 200 watchdog timeouts

2012-09-23 Thread Jurij Smakov
On Sun, Sep 23, 2012 at 02:07:46PM +, Mark Morgan Lloyd wrote:
 
 It went in as 688521 at about the same time as you posted. Pity I
 didn't hold off for another hour or so.

Thanks, I'll bcc this response to the bug, let's continue discussion 
there.

Looking at the output you see, I have doubts that it has anything to 
do with SILO though. SILO prints letters 'S', 'I', 'L' and 'O' 
(appearing before the prompt) after it completes execution of 
different parts of first-stage loader. As you can see in the code 
(first/first.S), printing 'S' is the first thing first-stage loader 
does upon startup. The fact that it is not seen in the console output 
suggests that even first-stage loader never got to run. The line

Boot device: /pci@1f,0/pci@1/scsi@8/disk@0,0:a  File and args:

which is normally printed by OBP before control is passed to SILO does 
not appear in the watchdog-reset case either, which, again, is a 
strong sign that failure happens before SILO has a chance to run.

In a failure case, how long does it take between you typing 'boot' and
watchdog reset message being displayed? This doc

http://docs.oracle.com/cd/E19102-01/n240.srvr/817-5481-11/understanding_wdtimer.html

appears to suggest that stuck watchdog would initiate a XIR after 60 
seconds by default, is it consistent with what you see? What are the 
values of various variables mentioned there on your system(s)? Does 
increasing the timeout help?

I really can't come up with any reason why it would work for Squeeze 
but not other releases, so testing all suspect SILO versions on the 
same machine would be an interesting experiment.

 This is something I've not had to do before- Debian usually just
 works or I have to go upstream if I want something bleeding-edge.
 Is this syntax right and in view of the message what should I have
 in sources.list etc?
 
 root@firewall3:/home/markMLl# apt-get install silo=1.4.14+git20100228-1+b1
 ..
 E: Version '1.4.14+git20100228-1+b1' for 'silo' was not found

That only works when you have repositories containing older/newer 
packages listed in your /etc/apt/source.list. Simply adding them 
(without configuring apt pinning appropriately) may mess up too many 
things, so the simplest way is probably to just download older SILO 
debs (should be available on archive.debian.org) and install them 
using dpkg -i.
 
Best regards,
-- 
Jurij Smakov   ju...@wooyd.org
Key: http://www.wooyd.org/pgpkey/  KeyID: C99E03CC


-- 
To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120923170137.ga12...@wooyd.org