debian-ports installer
Did anyone try building a debian-ports-sparc64 installer image yet? Or do you know the status, and what still got to be done, to build one? Thanks, Frans van Berckel -- To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/02dfa5e56c617efbaaf6ae12bb4595e3.squir...@webmail.xs4all.nl
Problems with Debian in LDOM
Hello, I recently got access to a T1000 machine running Solaris 10 with LDOMs version 1.2, and tried to test the latest Debian installer (Wheezy beta 2) by installing it into an LDOM. The installation appears to work mostly fine (we have a minor bug preventing automated loading of sunvnet and sunvdc modules), but after the installation finishes, the data on virtual disk (backed, in my case, by a file created with mkfile) is randomly corrupted, so on reboot the root file systems comes up in read-only mode, random binaries/scripts fail to execute, etc. By now I tried the installation 3 times, first two were using ext4 as root/home filesystems. In these cases I was able to complete the installation, but the system was unusable after the reboot due to data corruption on disk. I then tried using ext2 only, thinking that it might be ext4 issue, and was not even able to complete the installation due to disk errors. Relevant portion of dmesg output saved from this install run: [0.00] PROMLIB: Sun IEEE Boot Prom 'OBP 4.28.1 2008/02/11 13:04' [0.00] PROMLIB: Root node compatible: sun4v [0.00] Initializing cgroup subsys cpuset [0.00] Initializing cgroup subsys cpu [0.00] Linux version 3.2.0-3-sparc64 (Debian 3.2.23-1) (debian-ker...@lists.debian.org) (gcc version 4.6.3 (Debian 4.6.3-8) ) #1 Mon Jul 23 03:37:35 UTC 2012 [...] [ 6935.913467] sunvnet.c:v1.0 (June 25, 2007) [ 6935.917982] eth0: Sun LDOM vnet 00:14:4f:f8:e2:6f [ 6935.919764] sunvnet: eth0: PORT ( remote-mac 00:14:4f:f8:27:09 switch-port ) [ 7569.263779] sunvdc.c:v1.0 (June 25, 2007) [ 7569.265299] sunvdc: vdiska: 41720805 sectors (20371 MB) [ 7569.267844] vdiska: vdiska1 vdiska2 vdiska3 vdiska4 vdiska5 [...] [ 8659.183062] sunvdc: ldc_map_sg() failure, err=-40. [ 8659.183083] end_request: I/O error, dev vdiska, sector 10969060 [ 8659.778350] sunvdc: ldc_map_sg() failure, err=-40. [ 8659.778371] end_request: I/O error, dev vdiska, sector 10981396 [ 8659.857056] sunvdc: ldc_map_sg() failure, err=-40. [...] [ 8660.515942] Buffer I/O error on device vdiska2, logical block 1349865 [ 8660.515958] lost page write due to I/O error on vdiska2 [...] Error messages like this logged in dmesg multiple times, until installer fails and is aborted. The complete dmesg output is available at http://www.wooyd.org/tmp/dmesg_ldom.log. Questions: does anyone have properly working setup with Debian (or other Linux) in an LDOM? Should my particular setup (combination of Solaris/LDOM/kernel version) be supported? Is there anything in the LDOM setup I might need to tweak to make it work? Best regards, -- Jurij Smakov ju...@wooyd.org Key: http://www.wooyd.org/pgpkey/ KeyID: C99E03CC -- To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20120923103338.ga6...@wooyd.org
Re: Netra T1 200 watchdog timeouts
On Sat, Sep 22, 2012 at 12:26:26PM +0100, Richard Mortimer wrote: On 19/09/2012 13:10, Mark Morgan Lloyd wrote: Richard Mortimer wrote: On 18/09/2012 18:49, Mark Morgan Lloyd wrote: Richard Mortimer wrote: ... snip ... This affects both Lenny and Wheezy but does not affect Squeeze, i.e. it appears to be a regression. Since this happens in between the OBP boot command and SILO's boot prompt, I presume that it is a SILO problem or that the installer is doing something odd to the disklabel. Lenny:1.4.13 Squeeze: 1.4.14 Wheezy:1.4.14 I don't see how the LOM firmware would affect this. OBP maybe but if it is a processor watchdog then it I doubt its LOM. SILO would be my first suspect. SILO is also my suspect (after a lot of fiddling trying to disable lom watchdog from OBP etc.) and those are SILO version numbers :-/ Brain wasn't turned on enough to realise that! From memory I don't think the LOM watchdog is ever enabled in OBP on the T1 200. It only ever gets enabled by the device drivers once Solaris is running (if the packages you mention below are installed of course). OK but at the same time the README from Solaris patch 110208-21 explicitly says 5043823 Patch 110208-18 changes watchdog behavior and causes watchdog resets when probed and 4412177 lomlite2 watchdog is not always disabled on reboot - 110208-07 both of which read as though there could be spurious watchdog events even without Solaris's intervention. However I note your point about the LOM log not showing anything. I'm still pretty convinced that the problem you are seeing is nothing to do with LOM. I think that both of those are Solaris device driver issues too. Should I be raising this as a bug, or can I assume that the people who need to know about it are already aware of the issue? Given that this affects Wheezy then a Debian bug is certainly in order. I haven't had time to track the development of Wheezy closely but I think that it is pretty much using upstream SILO. I vaguely remember a few changes upstream recently for both ext2/4 support and for cpu detection. One of those could be causing your problem on the Wheezy build. Well, Mark mentioned that the same issue is encountered in both Wheezy and Squeeze SILO versions, which predates the recent ext2/4 changes. And yes, there haven't been any Debian-specific changes to upstream SILO as of version 1.4.14+git20100228-1, uploaded in February 2010. Before that we had some Debian-specific patches included. Mark, if you can try different SILO versions and find out which one introduced the regression, that would be great. As far as I can tell, releases shipped with the following versions: Lenny : 1.4.13a+git20070930-3 Squeeze: 1.4.14+git20100228-1+b1 Assuming that the failure was introduced between 1.4.13a+git20070930-3 (Lenny version) and 1.4.14+git20100228-1+b1 (Squeeze version), you just have one intermediate version (1.4.14+git20100207-1) to test. Given the nature of the problem I think it would be useful to have a good description of your installation in the bug. In particular filesystem layout (partition table), type (ext2/3/4) etc. may be relevant. A copy of the console session would be good to attach too. Yep, the bug would be useful. Given that it's the first report like this that I see and that a simple enough workaround exists, I would don't think it qualifies as RC. Best regards, -- Jurij Smakov ju...@wooyd.org Key: http://www.wooyd.org/pgpkey/ KeyID: C99E03CC -- To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20120923105458.ga6...@wooyd.org
Re: Netra T1 200 watchdog timeouts
Jurij Smakov wrote: Should I be raising this as a bug, or can I assume that the people who need to know about it are already aware of the issue? Given that this affects Wheezy then a Debian bug is certainly in order. It went in as 688521 at about the same time as you posted. Pity I didn't hold off for another hour or so. I haven't had time to track the development of Wheezy closely but I think that it is pretty much using upstream SILO. I vaguely remember a few changes upstream recently for both ext2/4 support and for cpu detection. One of those could be causing your problem on the Wheezy build. Well, Mark mentioned that the same issue is encountered in both Wheezy and Squeeze SILO versions, which predates the recent ext2/4 changes. Wheezy and Lenny, but not Squeeze. Which is odd in view of the (upstream) version numbers, and suggests that it's either something very specific to the distro version (e.g. kernel length) or is caused by the precise version of the compiler. And yes, there haven't been any Debian-specific changes to upstream SILO as of version 1.4.14+git20100228-1, uploaded in February 2010. Before that we had some Debian-specific patches included. Mark, if you can try different SILO versions and find out which one introduced the regression, that would be great. As far as I can tell, releases shipped with the following versions: Lenny : 1.4.13a+git20070930-3 Squeeze: 1.4.14+git20100228-1+b1 Assuming that the failure was introduced between 1.4.13a+git20070930-3 (Lenny version) and 1.4.14+git20100228-1+b1 (Squeeze version), you just have one intermediate version (1.4.14+git20100207-1) to test. This is something I've not had to do before- Debian usually just works or I have to go upstream if I want something bleeding-edge. Is this syntax right and in view of the message what should I have in sources.list etc? root@firewall3:/home/markMLl# apt-get install silo=1.4.14+git20100228-1+b1 .. E: Version '1.4.14+git20100228-1+b1' for 'silo' was not found Given the nature of the problem I think it would be useful to have a good description of your installation in the bug. In particular filesystem layout (partition table), type (ext2/3/4) etc. may be relevant. A copy of the console session would be good to attach too. Yep, the bug would be useful. Given that it's the first report like this that I see and that a simple enough workaround exists, I would don't think it qualifies as RC. The problem here is that it mandates having a terminal attached to a headless system since auto-boot doesn't work. Setting auto-boot-retry? true doesn't help. I'd have reported this earlier if I'd not spent three weeks messing around with the OBP's lom@ and lom!, potential ways of getting at it from Linux and looking for the Solaris LOM package :-/ -- Mark Morgan Lloyd markMLl .AT. telemetry.co .DOT. uk [Opinions above are the author's, not those of his employers or colleagues] -- To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/k3n53i$iss$1...@pye-srv-01.telemetry.co.uk
Re: Problems with Debian in LDOM
From: Jurij Smakov ju...@wooyd.org Date: Sun, 23 Sep 2012 11:33:38 +0100 Questions: does anyone have properly working setup with Debian (or other Linux) in an LDOM? Should my particular setup (combination of Solaris/LDOM/kernel version) be supported? Is there anything in the LDOM setup I might need to tweak to make it work? I haven't tested LDOM in at least 2 years, it's very likely bugs have crept in. -- To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20120923.121440.1491137498723341084.da...@davemloft.net
Re: Netra T1 200 watchdog timeouts
On Sun, Sep 23, 2012 at 02:07:46PM +, Mark Morgan Lloyd wrote: It went in as 688521 at about the same time as you posted. Pity I didn't hold off for another hour or so. Thanks, I'll bcc this response to the bug, let's continue discussion there. Looking at the output you see, I have doubts that it has anything to do with SILO though. SILO prints letters 'S', 'I', 'L' and 'O' (appearing before the prompt) after it completes execution of different parts of first-stage loader. As you can see in the code (first/first.S), printing 'S' is the first thing first-stage loader does upon startup. The fact that it is not seen in the console output suggests that even first-stage loader never got to run. The line Boot device: /pci@1f,0/pci@1/scsi@8/disk@0,0:a File and args: which is normally printed by OBP before control is passed to SILO does not appear in the watchdog-reset case either, which, again, is a strong sign that failure happens before SILO has a chance to run. In a failure case, how long does it take between you typing 'boot' and watchdog reset message being displayed? This doc http://docs.oracle.com/cd/E19102-01/n240.srvr/817-5481-11/understanding_wdtimer.html appears to suggest that stuck watchdog would initiate a XIR after 60 seconds by default, is it consistent with what you see? What are the values of various variables mentioned there on your system(s)? Does increasing the timeout help? I really can't come up with any reason why it would work for Squeeze but not other releases, so testing all suspect SILO versions on the same machine would be an interesting experiment. This is something I've not had to do before- Debian usually just works or I have to go upstream if I want something bleeding-edge. Is this syntax right and in view of the message what should I have in sources.list etc? root@firewall3:/home/markMLl# apt-get install silo=1.4.14+git20100228-1+b1 .. E: Version '1.4.14+git20100228-1+b1' for 'silo' was not found That only works when you have repositories containing older/newer packages listed in your /etc/apt/source.list. Simply adding them (without configuring apt pinning appropriately) may mess up too many things, so the simplest way is probably to just download older SILO debs (should be available on archive.debian.org) and install them using dpkg -i. Best regards, -- Jurij Smakov ju...@wooyd.org Key: http://www.wooyd.org/pgpkey/ KeyID: C99E03CC -- To UNSUBSCRIBE, email to debian-sparc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20120923170137.ga12...@wooyd.org