from:"Steffen Grunewald"

Install particular version of package?

2024-03-01 Thread Steffen Grunewald

Hi,

a quick question the FAI Guide refused to answer:

Is it possible to specify "package=1.2.3+deb12" in a package_config/CLASS
file for the "instsoft" task, or
do I have to "apt-get -y install package=1.2.3+deb12" during the "configure"
task, in a scripts/CLASS/99downgrade script?

(Also I noticed that the "mountdisks" task, following the "partition" one,
seems to not be mentioned at all. Is my Guide too outdated?)

Thanks,
 Steffen


-- 
Steffen Grunewald, Cluster Administrator
Max Planck Institute for Gravitational Physics (Albert Einstein Institute)
Am Mühlenberg 1 * D-14476 Potsdam-Golm * Germany
~~~
Fon: +49-331-567 7274
Mail: steffen.grunewald(at)aei.mpg.de
~~~

Re: Install particular version of package?

2024-03-05 Thread Steffen Grunewald

Hi all,

since I seem to be the only one who ever faced this question, I'll also
be the only one to answer it...

On Fri, 2024-03-01 at 14:05:28 +0100, Steffen Grunewald wrote:
> 
> Is it possible to specify "package=1.2.3+deb12" in a package_config/CLASS
> file for the "instsoft" task, or

It is possible, but mistyping the version (or specifying one that has been
removed from the repository) will result in the infamous "25600 25600" error,
causing not only this particular package missing from the installation but
a whole lot more!

> do I have to "apt-get -y install package=1.2.3+deb12" during the "configure"
> task, in a scripts/CLASS/99downgrade script?

This way is much safer, and it saves the trouble of getting things right
when an upgraded package fixes the broken one - if you check the installed
version ("dpkg -l package | awk '{print $3}'") and in that "case" *only*
perform the downgrade.

HTH someone else,

Steffen

Re: bookworm versus noble - i40e netdevice name discrepancy

2024-06-04 Thread Steffen Grunewald

On Tue, 2024-06-04 at 14:02:34 +0200, Thomas Lange wrote:
> Now the question is if this is a change because of v255, or if Ubuntu
> is doing things different to Debian. After reading the URL below I
> guess it's because of the version.
> Currently I have no idea how to properly fix this. I would use some
> magic shell/sed commands for this. A proper fix would be to use a
> newer systemd version during the FAI installation, but that's not
> easy/possible currently.

What about extracting the interface names used during FAI installation,
and creating {/etc,/lib}/systemd/network/*.link files that would be used
by systemd to rename the interfaces found at system startup?

man 5 systemd.link

Best,
 Steffen

-- 
Steffen Grunewald, Cluster Administrator
Max Planck Institute for Gravitational Physics (Albert Einstein Institute)
Am Mühlenberg 1 * D-14476 Potsdam-Golm * Germany
~~~
Fon: +49-331-567 7274
Mail: steffen.grunewald(at)aei.mpg.de
~~~

Re: upgrade FAI 2.10.2 to FAI 3

2007-04-16 Thread Steffen Grunewald

On Tue, Apr 10, 2007 at 11:33:43AM +0200, Rudy Gevaert wrote:
> I see not to much problems, except:
> - my configspace depending a bit on sarge
> - serving/creating the sarge nfsroot

Just keep it (a copy of it).

> Does FAI 3 support serving several nfsroot out of the box?

It's not FAI that manages the nfsroot - it's the TFTP server, and,
yes, tftpd-hpa has been doing this for quite some time (I'm running
three nfsroots for different architectures)

Cheers,
 Steffen

(ATM doing the opposite: serving an etch amd64 nfsroot from a sarge
i386 machine :-)

-- 
Steffen Grunewald * MPI Grav.Phys.(AEI) * Am Mühlenberg 1, D-14476 Potsdam
Cluster Admin * http://pandora.aei.mpg.de/merlin/ * http://www.aei.mpg.de/
* e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7233,fax:7298}
No Word/PPT mails - http://www.gnu.org/philosophy/no-word-attachments.html

X7DBR woes, was Re: FAI kernel won't boot on certain boxes

2007-05-10 Thread Steffen Grunewald

On Wed, May 09, 2007 at 06:38:49PM +0200, Thomas Lange wrote:
> >>>>> On Wed, 09 May 2007 18:12:43 +0200, Frank Doepper <[EMAIL PROTECTED]> 
> >>>>> said:
> 
> > But both, the out-of-the-box kernel and the self-compiled kernel, won't
> > boot on a certain variant of "transtec" Intel Celeron PC. It says
> > "Loading vmlinuz-install..Ready." and dies.
> Mmm. Which variant?
> I only know of problems with some e1000 cards, where the fai
> kernels hangs when doing DHCP request.

Since I'm trying to debug one of my servers too, this might be useful to 
someone:

Machine is transtec, X7DBR based, dual Woodcrest 5060. First disk is 
vanilla SATA, three others connected to 3ware 9500S as RAID. There's 
a SIMSO+ IPMI card, connected to its own ethernet port.

Using the Debian Etch Multi-Arch netinstall CD, I can setup a full system
(all software options selected) without any problems, no difference whether
I use eth0 or eth1.

If I try to FAI install, I get MD5sum errors from random packages.
If I try to FAI sysinfo, the machine sends back result files (often incomplete)
and dies instead of waiting for RETURN (in both cases, there is no "reboot"
in the FAI commandline).

Etch netinstall uses version 7.1.9 of the Intel/PRO1000 driver while FAI
kernel 2.6.18-4 already has a higher number.
I tried to use my own homegrown 2.6.21.1 FAI kernel but there seems to be
no big difference.

Is there a hardware bug known with this particular board?

Is there an archive of "old" FAI kernels, and which e1000 drivers do these
use?

Cheers,
 Steffen

-- 
Steffen Grunewald * MPI Grav.Phys.(AEI) * Am Mühlenberg 1, D-14476 Potsdam
Cluster Admin * http://pandora.aei.mpg.de/merlin/ * http://www.aei.mpg.de/
* e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7233,fax:7298}
No Word/PPT mails - http://www.gnu.org/philosophy/no-word-attachments.html

Re: X7DBR woes, was Re: FAI kernel won't boot on certain boxes

2007-05-10 Thread Steffen Grunewald

On Thu, May 10, 2007 at 02:18:11PM +0200, Thomas Lange wrote:
> >>>>> On Thu, 10 May 2007 09:01:58 +0200, Steffen Grunewald <[EMAIL 
> >>>>> PROTECTED]> said:
> 
> > If I try to FAI install, I get MD5sum errors from random packages.
> Are you using NFS over UDP? Try NFS over TCP.
HTTP ...

> > Is there an archive of "old" FAI kernels, and which e1000 drivers do 
> these
> see http://snapshot.debian.net/
> 
> fai-kernels 1.12 is available from 
> http://www.informatik.uni-koeln.de/fai/download/sarge/
Thanks

> Did you try to use the stock Debian kernel with an initrd for booting FAI?
Not yet

Steffen

NIS on private network

2007-05-31 Thread Steffen Grunewald

Hi,

this might be a semi-OT question, feel free to flame me off-list...

I've got to setup NIS on a compute cluster. The NIS master server has two
network interfaces, eth0 is public and eth1 private. Of course, NIS should
be delivered only to the private network (/etc/ypserv.securenets).
Apparently there is no way to force ypbind to use the private network if
I use the broadcast option (command-line or /etc/yp.conf): requests will
be sent out only via eth0 (not even loopback!) and will get stuck. Same 
for other head nodes - for obvious reasons, the compute nodes are fine.

Of course one solution would be to specify ypserver entries in /etc/yp.conf
(instead of broadcast), but: can this be done in a more general fashion?
(this is where FAI enters the stage: I'm hesitant to add yet another class)

Hints welcome.

Cheers,
 Steffen

-- 
Steffen Grunewald * MPI Grav.Phys.(AEI) * Am Mühlenberg 1, D-14476 Potsdam
Cluster Admin * http://pandora.aei.mpg.de/merlin/ * http://www.aei.mpg.de/
* e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7233,fax:7298}
No Word/PPT mails - http://www.gnu.org/philosophy/no-word-attachments.html

Re: New user do FAI

2007-06-05 Thread Steffen Grunewald

On Tue, Jun 05, 2007 at 02:03:55PM +0200, Thomas Lange wrote:
> >>>>> On Tue, 05 Jun 2007 12:29:05 +0100, Jaime Ventura <[EMAIL PROTECTED]> 
> >>>>> said:
> 
> > As far as I understood by reading the documentation, NFS is needed for 
> NFS is needed for the nfsroot. The kernel is received via TFTP.
BTW, looking at the new (nfs://server/tree) format for specifying the nfsroot,
are there plans to provide support for other means as well (sshfs, ocfs, ...)?

> > If so, can I use other that NFS?
> Yes, technical it's possible to replace NFS with an huge ramdisk, but you will
> not want to do this, because it's much more difficult to set up. 
I could imagine (ab)using a partition for this - the same way Solaris did
(still does?) when installing a miniroot.

> > scripts,... )  be transfered to the client? NFS also?
> For accesing the Debian package you can use NFS, TFP, HTTP. man
> sources.list will help you.
config space, in most cases, should be small enough to be kept in ramdisk
(especially since the packages directory has been moved away). I believe
CVS checkout would work this way...


Speaking of general documentation, what's the timeline for FAI 3 manuals?
I'll have to move to 3.1 soon, from a pure sarge (2.8.4) install. I guess
the S[0-9][0-9]* -> [0-9][0-9]* transition is the easiest part :-/


Cheers,
 Steffen

-- 
Steffen Grunewald * MPI Grav.Phys.(AEI) * Am Mühlenberg 1, D-14476 Potsdam
Cluster Admin * http://pandora.aei.mpg.de/merlin/ * http://www.aei.mpg.de/
* e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7233,fax:7298}
No Word/PPT mails - http://www.gnu.org/philosophy/no-word-attachments.html

Overloaded server

2007-06-22 Thread Steffen Grunewald

Hi,

yesterday I tried to FAI setup about 600 nodes using one file/webserver.
While mounting the nfsroot took its time for some of the nodes, and there
was no problem, I got lots of http download errors (resulting in that
famous Broken package, and error 25600 messages).
Is there a way to allow more tolerance (longer timeouts) to the apt 
operations?
At the moment I try to catch this situation in a chboot hook, but that
would deprive me of the correctly downloaded error messages... how much 
effort would it be to run the savelog task before the chboot task? (and
has there be a reason for the current order?)

Cheers,
 Steffen

-- 
Steffen Grunewald * MPI Grav.Phys.(AEI) * Am Mühlenberg 1, D-14476 Potsdam
Cluster Admin * http://pandora.aei.mpg.de/merlin/ * http://www.aei.mpg.de/
* e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7233,fax:7298}
No Word/PPT mails - http://www.gnu.org/philosophy/no-word-attachments.html

Re: Overloaded server

2007-06-22 Thread Steffen Grunewald

On Fri, Jun 22, 2007 at 05:08:14PM +0200, Thomas Lange wrote:
> >>>>> On Fri, 22 Jun 2007 16:42:40 +0200, Steffen Grunewald <[EMAIL 
> >>>>> PROTECTED]> said:
> 
> > yesterday I tried to FAI setup about 600 nodes using one file/webserver.
> Wow. Did you start them all at once? Or with some time (how long)
> between each poweron? I would have expected more problems with TFTP.

Spread over about 20 minutes, in batches of about 100. :-)

> > was no problem, I got lots of http download errors (resulting in that
> > famous Broken package, and error 25600 messages).
> Maybe the webswerver only allows a certain amount of connections at a
> time? How was the CPU load?

don't know. standard apache (2?) of Sarge.
CPU load was not too high I guess ... I'm installing 50 nodes right now, and
some nfsds are busy, load <<5. I've seen loads up to 60.
Is there a similar limitation to apache threads as it is with nfsds?

> I don't understand that.

grep -q "Broken package" $LOGDIR/fai.log && reboot

> > but that
> > would deprive me of the correctly downloaded error messages... how much 
> > effort would it be to run the savelog task before the chboot task? (and
> > has there be a reason for the current order?)
> Just log into a machine and read /tmp/fai/*.log

I cannot if it reboots itself. Reboot by hand is counterintuitive with FAI.

The only (small) problem is that I cannot track machines which might have
different problems, cycling through the instell for ever...

So why does chboot precede savelog? Any reasonable reason?

S

Re: BIOS

2007-08-09 Thread Steffen Grunewald

On Thu, Aug 09, 2007 at 10:42:41AM +0200, Thomas Lange wrote:
> > On Thu, 9 Aug 2007 09:37:02 +0200, Henning Fehrmann <[EMAIL PROTECTED]> 
> > said:
> 
> > we are interested in flashing a BIOS image and in manipulating the 
> NVRAM of the motherboard 
> > automatically.
> Wow. Do you really need this?
> 
> > Unfortunately, using certain vendors, the access to the NVRAM is not 
> straightforward.
> > These vendors are offering DOS tools only, to write in the NVRAM, 
> hence, we have to boot
> > a DOS image and here starts the trouble.
> You can boot a DOS or floppy image using PXE. This is how a
> pxelinux.cfg looks like for booting a floppy image: 
> 
> default dos
> label dos
>  kernel memdisk
>  append keeppxe initrd=floppy.img
> 
> But AFAIR I had no success, because the dos flashing utilities seems
> to wanna have a real floppy, not a fake of a floppy.

It worked here, but I think that's something Henning has got running too.
The problem is to tell the server to swap its PXE config file for this 
particular
machine *after* the flash has been completed but *before* rebooting 
(automatically
or by power cycle/IPMI reset). It'd be necessary to send some kind of "signal"
to the server (a dummy tftp request is what I've done in the past, at least from
a tomsrtbt image I used to perform some partitioning magic). Therefore, it would
be nice to have a network stack under freedos (which the BIOS flash disks 
nowadays
are based on).

> > Optimally, using the DOS environment flashes the BIOS, sets the 
> > NVRAM and sends a message to the FAI server to prepare the next boot of 
> the clients for the
> > installation.
> You could send a message to the faimond which can change the pxelinux 
> configuration.

That's the problem: to get a reasonable message *somewhere* you'd need a TCP/IP
stack. Or am I missing something?

Cheers,
 Steffen

Re: [Fwd: Re: BIOS]

2007-08-09 Thread Steffen Grunewald

On Thu, Aug 09, 2007 at 11:45:43AM +0200, Carsten Aulbert wrote:
> We are currently looking into the package offered here:
> 
> http://www.bgnett.no/~giva/index.html

That's the 32-bit version of WatTCP, isn't it? I couldn't get a proper
documentation (and a binary build!) when I last looked... If you succeed, 
please share ...

Cheers
 Steffen

Re: BIOS

2007-08-16 Thread Steffen Grunewald

On Wed, Aug 15, 2007 at 05:44:49PM +0200, Henning Fehrmann wrote:
> 
> We worked a while on the problem of flashing a bios and writing into the 
> nvram of the motherboard.
> Here is a recipe which is still under development. Maybe somebody needs it.

Congratulations for getting thus far!

> Create a bootable dos image
> 
>1. Go to http://www.freedos.org/ and download an installation-cd image, 
> burn it onto a cd
>2. Install freedos on a box. We tried it on
>  1. a Single-Dual Core Opteron 2218
>1. with a Supermicro H8SSL-i2 board
>2. and WD 1600YS SATA hd's 
>  2. a Single-Dual Core Xeon 3060
>1. with a Supermicro PDSML-LN2 board
>2. and WD 1600YS SATA hd's 
>  3. We have not been able to install freedos on a box with a Fujitsu 
> Siemens D2461-A2 board but you can readout and write the board's nvram in a 
> running system 
>3. Create a dos partition which starts from the first sector using the dos 
> fdisk tool and format it with fat16. Make this partition not to big, since it 
> has to go with pxeboot over the net. We toke approximately 32M.
>4. Install a basic dos environment with mbr writing tools (smbtmgrx) and 
> other programs you consider to be useful.
>5. Reboot the box and go into the dos environment, execute a mbr writing 
> tool.
>6. Reboot again using liveCD Linux system (Knoppix).
>7. Copy the first 63+N blocks of the hd into a file dd if=/dev/sda 
> of=hd.img bs=512 count=63.
>8. Copy the the complete dos partition into another file dd if=/dev/sda1 
> of=dos.img bs=512.
>9. Combine the files dd if=dos.img of=hd.img bs=512 seek=63 and copy 
> hd.img onto the tftp server. 

This IMHO could be done without touching a real disk's MBR using a file 
(untested).

> Modify the Image
> 
> The idea is to mount the image to put different tools on there. 
> Unfortunately, the image we just created has an MBR, which mount complains 
> about. To get around that, we cut off the first (in our case: 63) blocks. The 
> dos partition starts on block 64. dd if=hd.img of=dos.img bs=512 skip=63. To 
> find out at which block the partition starts, you can search for the 
> ASCII-string "FRDOS4", which indicates the beginning of a dos partition. This 
> is a dos partition which can be loopback mounted. Copy the needed files into 
> the mounted path. After umount, you update the image of the hd partition 
> using dd again, with the seek parameter indicating the blocks you keep:
> dd if=dos.img of=hd.img bs=512 seek=63.

IIRC loopback mount has an option to skip to an offset inside the image... man 
mount

> Fill the dos partition with the tools and drivers you need. For us, this was 
> basically:
> 
> * WATT-32 (TCP/IP stack for DOS) http://www.bgnett.no/~giva/ 

Is there a binary version I have missed when I last looked?

> * card driverhttp://www.georgpotthast.de/sioux/packet.htm

You might also use the UNDI driver that sits on top of the PXE in the card's 
firmware.

> In our case it worked with broadcom ethernet cards 
> * ssh2doshttp://sourceforge.net/projects/sshdos
> * and misc. tools like vim 
> 
> 
> Booting the dos image
> 
> Assuming you use pxelinux, you can put this in the appropriate config file 
> for the boot start:
> 
>  default fai-generated
> 
>  label fai-generated
>  kernel kernel/memdisk
>  append initrd=path/to/hd.img
> 
> * After that, you should be able to boot into the dos image and see all the 
> tools you have put there. 
> * Next, you need to load the appropriate network drivers. 
> * Search for the bios flash and nvram reading and writing tools.
> * The ssh client works with a key, so you avoid the passwd prompt.
>   ssh the FAI server and change the pxelinux.cfg files to initialize the FAI 
> installation (fai-chboot). 

Of course that's a more elegant solution than having a fake TFTP request which 
would
trigger an action on the server side. Why didn't you go for a simple rsh 
instead, which
is already a prerequisite for FAI (writeback of install logs)?

> * reboot
> 
> Put everything in the autoexec.bat and it works automatically. (Has not been 
> tested yet)

If you don't mind I'd like to have a look at your image. I'd suggest to have 
an interface to the possible tasks as clean as possible so that changes to
the payload have to be done in only one place (our vendor for example uses
a zip file that contains the payload, which is unpacked into a ramdisk, and
it's setup.bat is started - this way the autoexec.bat stays unmodified if
you replace the payload) - KISS :-)
(My images are in CVS, faiconfig/files/boot/fai/floppy-images/)

Cheers,
 Steffen

Re: Configuration Management and Monitoring of a Debian Etch Beowulf Cluster

2007-08-31 Thread Steffen Grunewald

On Fri, Aug 31, 2007 at 10:18:58AM +0330, Farid Behnia wrote:
> Hi,
> 
> I've put together a simple 2-node cluster using Debian etch , OpenMPI , FAI
> & Cfengine.
> I'm looking for ideas that can help me with building a better self-healing
> cluster. Right now I'm making rule files for cfengine and would acknowledge
> any input on sample files and important configurations that need to be made
> for the cluster's health. (Although it's site-specific but I'm sure I can
> get good hints out of them)
> 
> However I'd also be glad to see if you have any monitoring system in mind
> that can cooperate with cfengine in the maintenance job. I've looked briefly
> into Ganglia and Nagios so far. It seems Ganglia is mostly meant for large
> (groups of) clusters and focuses on hw resources. Nagios seems to be
> better-suited for my job, but the gurus at cfengine mailing list believe
> that cfenvd & cfexecd can provide equal monitoring & recovery capability (in
> terms of response time).
> What's your take on either of them?
> 
> Thanks beforehand to anyone sharing their experience.

Although it's not exactly FAI related, you might have a look at Gluster:
http://www.gluster.org

Steffen

-- 
Steffen Grunewald * MPI Grav.Phys.(AEI) * Am Mühlenberg 1, D-14476 Potsdam
Cluster Admin * http://pandora.aei.mpg.de/merlin/ * http://www.aei.mpg.de/
* e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7233,fax:7298}
No Word/PPT mails - http://www.gnu.org/philosophy/no-word-attachments.html

Re: configuration to boot from network card

2007-11-09 Thread Steffen Grunewald

On Fri, Nov 09, 2007 at 02:15:14PM +0100, Thomas Lange wrote:
> > On Fri, 09 Nov 2007 13:59:32 +0100, "antares atlantide" <[EMAIL 
> > PROTECTED]> said:
> 
> > [EMAIL PROTECTED]:/srv/fai/nfsroot/bin# dpkg -l | grep fai
> > ii  fai-client3.2.1 
>   
> > Fully Automatic Installation client package
> > ii  fai-doc   3.2.1 
>   
> > Documentation for FAI
> > ii  fai-kernels   1.17+etch5
>   
> > special kernels for FAI (Fully Automatic Installation)
> > ii  fai-server3.2.1
> You must not use fai-kernels with fai 3.2.1!

Shouldn't there be a Conflict: then?

Cheers,
 Steffen

FAI install over serial console, manual interventions?

2008-04-21 Thread Steffen Grunewald

Hi,

we've run into the problem that there's only a Serial-over-LAN console
available (BMC card, with IP address pre-set or assigned via DHCP), and
there might be no sshd running yet, or the network is misconfigured in
such a way that ssh connections aren't possible.
Is there a reliable way to interrupt FAI's operation and open a shell?
Last time I tried ^C, I directly went into rebooting :(

Cheers,
 Steffen

-- 
Steffen Grunewald * MPI Grav.Phys.(AEI) * Am Mühlenberg 1, D-14476 Potsdam
Cluster Admin * http://pandora.aei.mpg.de/merlin/ * http://www.aei.mpg.de/
* e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7233,fax:7298}
No Word/PPT mails - http://www.gnu.org/philosophy/no-word-attachments.html

Re: Need simulate apt-get with backports

2008-05-30 Thread Steffen Grunewald

On Fri, May 30, 2008 at 01:52:28PM +0200, Henning Sprang wrote:
> TOUZEAU DAVID rote:
> >
> >apt-get install debian-backports-keyring
> >apt-get -t etch-backports install postfix
> >
> >how to simulate it in /fai/config/package_config file ?*
> 
>  -t is not supported in package_config files.
> 
> Possibilities:
> 
> * You could look at the install_packages script and add an additional 
> PACKAGES type for your use. This will have several implications, 
> depending on how complex this is to be solved, maybe Thomas will release 
> a new version with this patch  soon
> 
> * install the packages you need with a script in scripts (not really 
> nice if you have more than a small set of packages)
> 
> * Try to use pinning to get your packages installed

Another one: 

- have your own repository of additional packages
- get the source package from backports and re-build it 
(that's what I often had to do if amd64 packages were lacking)
- or download the binary package and install it locally

This has the advantage that when a newer version comes out in the main
distro, you won't get stuck with the backported one (and there's no ugly
code in your FAI installation that noone could explain after 2 years)

Of course this means that you have to resolve dependencies properly,
and perhaps host a lot more packages.

S

-- 
Steffen Grunewald * MPI Grav.Phys.(AEI) * Am Mühlenberg 1, D-14476 Potsdam
Cluster Admin * http://pandora.aei.mpg.de/merlin/ * http://www.aei.mpg.de/
* e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7233,fax:7298}
No Word/PPT mails - http://www.gnu.org/philosophy/no-word-attachments.html

Re: FAI does not need a tftpd daemon any more, which supports the tsize option

2008-09-25 Thread Steffen Grunewald

On Wed, Sep 24, 2008 at 06:10:41PM +0200, Thomas Lange wrote:
> I just read in the changelog of syslinux-common, that pxelinux 3.70 and
> newer versions does not need a tftpd which has the tsize option. Currently
> FAI recommends the package tftp-hpa which supports this option, we may
> remove this in the future.
> 
> >From the changelog of syslinux-common:
> 
>  * PXELINUX: We no longer require a TFTP server which supports
>the tsize option for all transfers.

While that's true, the TFTP server doesn't answer pxelinux requests only.
If there are Alphas involved (yes, I still got some for FAI toys), there
is a remote (even a real) chance that at least the "-r blksize" option to
tftpd-hpa makes sense (I'm talking about XP1000's with the latest, still
buggy firmware, which would use wrong ports if that TFTP option wasn't
handled specially).

Just as a reminder... there's another world beyond PXE.

Steffen

-- 
Steffen Grunewald * MPI Grav.Phys.(AEI) * Am Mühlenberg 1, D-14476 Potsdam
Cluster Admin * http://pandora.aei.mpg.de/merlin/ * http://www.aei.mpg.de/
* e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7233,fax:7298}
No Word/PPT mails - http://www.gnu.org/philosophy/no-word-attachments.html

Re: Is it possible to install in a row 2 differents system.

2008-09-25 Thread Steffen Grunewald

On Wed, Sep 24, 2008 at 10:18:01PM +0200, william Famy wrote:
> Hi.
> 
> Is it possible to install 2 different system on a computer in one go?
> 
> Exemple I want to install a simple lenny  (for test) and a crypt etch on 
> the same laptop.
> 
> is it possible to do it ?
> 
> do I have to do 2 differents install?

I guess you could tweak the final FAI step (which would end in a reboot,
after *removing* the PXE file on the server) so that it doesn't remove
but replace the file, then reboot. (Hooks are your friends.)
You may introduce some parameters into the PXE file, to tell FAI what to do ...

Steffen

Problem with re-installing old server (setup-storage GPT related)

2009-06-30 Thread Steffen Grunewald

I was trying to re-install an old server that had been set-up with pre-Etch
FAI. Since there was no official GPT support at that time, I made a hook
that performed the task.

Here's the output of parted -s /dev/sdb (for the already partitioned disk):

(parted) unit chs print free

Disk /dev/sdb: 848786,60,41
Sector size (logical/physical): 512B/512B
BIOS cylinder,head,sector geometry: 848786,255,63.  Each cylinder is 8225kB.
Partition Table: gpt

Number  Start   End  File system  Name Flags
 1  0,0,34  848786,60,8  xfs  primary

(parted) unit B print free

Disk /dev/sdb: 6981504466943B
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start   End SizeFile system  Name Flags
 1  17408B  6981504450047B  6981504432640B  xfs  primary

I defined the partition in disk_config:

disk_config sdb disklabel:gpt preserve_always:1
primary /storage   1-  xfs rw

When running the partition task, I get an error:

Calling task_partition
Partitioning local harddisks using setup-storage
Starting setup-storage 1.0.3
Using config file: /var/lib/fai/config/disk_config/HEXE_STORAGE
Executing: parted -s /dev/sda unit TiB print
Executing: parted -s /dev/sda unit B print free
Executing: parted -s /dev/sda unit chs print free
Executing: parted -s /dev/sdb unit TiB print
Executing: parted -s /dev/sdb unit B print free
Executing: parted -s /dev/sdb unit chs print free
Executing: vgdisplay --units m -s
Executing: mdadm --detail --scan --verbose -c partitions
Disk /dev/sdb is too small - at least 6981504467456 bytes are required

- apparently some rounding-up is happening somewhere?

Back to old style for now...

Cheers,
 Steffen

-- 
Steffen Grunewald * MPI Grav.Phys.(AEI) * Am Mühlenberg 1, D-14476 Potsdam
Cluster Admin * http://pandora.aei.mpg.de/merlin/ * http://www.aei.mpg.de/
* e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7233,fax:7298}
No Word/PPT mails - http://www.gnu.org/philosophy/no-word-attachments.html

cfagent script replaces with empty string?

2009-06-30 Thread Steffen Grunewald

I'd like to define the network interfaces statically. Therefore, I have
prepared a /etc/network/interfaces file (with the proper class), the
corresponding section reads:

auto eth0
iface eth0 inet static
address IPADDRESS
netmask IPNETMASK
network IPNETWORK

The script reads:

#!/usr/sbin/cfagent -v -f

# modify setup: /etc/network/interfaces

control:
any::
actionsequence = ( editfiles )
EditFileSize = ( 65000 )

editfiles:
# first for all classes
any::
{ ${target}/etc/network/interfaces
  ReplaceAll"IPADDRESS" With"${IPADDR}"
  ReplaceAll"IPNETMASK" With"${NETMASK}"
  ReplaceAll"IPNETWORK" With"${NETWORK}"
}

And, of course, the environment variables are set (checked in a shell script
before and after):

Nevertheless, $NETMASK isn't used, and the following fragment results:

auto eth0
iface eth0 inet static
address 10.100.200.13
netmask
network 10.100.0.0

Thanks to the -v option to cfagent, I can see the following:

Begin editing /target/etc/network/interfaces
Checking for global replace/IPADDRESS/10.100.200.13
Inserting #address 10.100.200.13
Delete Item: #address IPADDRESS
Inserting address 10.100.200.13
Delete Item: address IPADDRESS
Checking for global replace/IPNETMASK/
Inserting #netmask
Delete Item: #netmask IPNETMASK
Inserting netmask
Delete Item: netmask IPNETMASK
Checking for global replace/IPNETWORK/10.100.0.0
Inserting #network 10.100.0.0
Delete Item: #network IPNETWORK
Inserting network 10.100.0.0
Delete Item: network IPNETWORK
End editing /target/etc/network/interfaces

With an old version of FAI, and cfengine instead of cfagent, this used to work.

Any suggestions?


Cheers,
 Steffen

-- 
Steffen Grunewald * MPI Grav.Phys.(AEI) * Am Mühlenberg 1, D-14476 Potsdam
Cluster Admin * http://pandora.aei.mpg.de/merlin/ * http://www.aei.mpg.de/
* e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7233,fax:7298}
No Word/PPT mails - http://www.gnu.org/philosophy/no-word-attachments.html

cfagent 2.2.8 doesn't behave ...

2009-07-01 Thread Steffen Grunewald

... as cfengine 2.1.20 did.
In particular, I'm constantly running into problems when using "^" to
denote a start-of-line, and I've also seen strange behaviour when it comes
to variable substitutions (with definitions made in the environment).

Is this a known problem? bugs.debian.org isn't very detailed...

Cheers,
 Steffen

Re: cfagent 2.2.8 doesn't behave ...

2009-07-01 Thread Steffen Grunewald

On Wed, Jul 01, 2009 at 02:03:43PM +0200, Steffen Grunewald wrote:
> ... as cfengine 2.1.20 did.
> In particular, I'm constantly running into problems when using "^" to
> denote a start-of-line, and I've also seen strange behaviour when it comes
> to variable substitutions (with definitions made in the environment).

Never mind about the caret issue. There was another obstacle for the pattern
to be matched. I'm still puzzled about the ${VAR} thing, though.

Apologies for the noise...
 S

Create FAI NFSroot for i386 arch on amd64 host?

2009-07-02 Thread Steffen Grunewald

Is it possible (without setting up a i386 chroot) to create a i386 nfsroot
for FAI, on a amd64 machine?

Cheers,
 Steffen

Re: Create FAI NFSroot for i386 arch on amd64 host?

2009-07-06 Thread Steffen Grunewald

On Thu, Jul 02, 2009 at 09:11:47PM +0200, Michael Tautschnig wrote:
> > Steffen Grunewald wrote:
> > > Is it possible (without setting up a i386 chroot) to create a i386 nfsroot
> > > for FAI, on a amd64 machine?
> > 
> > I did not try that, but as far as I lknow, debootstrap has an option to
> > explictly chose the architecture, and make-fai-nfsroot.conf has a
> > setting DEBOOTSTRAP_OPTIONS or so, where you can add that.
> > 
> 
> Adding --arch i386 to FAI_DEBOOTSTRAP_OPTS (of make-fai-nfsroot.conf) should
> suffice (at least this is what we do over here). 

Thanks, that did the trick.

Apologies for my particular blind spot,
 Steffen

differences between setup_harddisks and setup-storage

2009-07-08 Thread Steffen Grunewald

I had to upgrade a rather big cluster from Etch to Lenny this week.
The old fai server was still running fai_2.8.4sarge1, but the (Etch) 
nfsroot had fai-client_3.1.8 installed (Lenny: 3.2.17~lenny1).
Since the need to preserve several partitions always causes me headaches
I would have preferred to continue using setup_harddisks but somehow
the USE_SETUP_STORAGE setting slipped my attention.

I had to find that *even if I preserve_always* a certain partition at 
the "end" of the disk, setup-storage would complain about it (not ending
at a cylinder boundary: this is a RAID on a 3ware controller which would
fake geometries anyway) - I had to backup, drop the preserve_* setting,
and restore after installation.
Same happened to a 7TB partition (on an Areca controller) which had to be
created by hand previously (leaving the minimum of 34 sectors for GPT 
information at the beginning and end): I had to workaround this (by 
not declaring the disk at all in disk_config, and using my old manual
setup scripts to conditionally mkfs and add the fstab line).

In addition, I noticed that the new style would make partitions on "real
disks" (no geometry translation) _smaller_ than they were before, resulting
in a gap:

Disk /dev/sda: 164.6 GB, 16469620 bytes
255 heads, 63 sectors/track, 20023 cylinders, total 321672960 sectors
Units = sectors of 1 * 512 = 512 bytes
Disk identifier: 0x39d139d0

   Device Boot  Start End  Blocks   Id  System
/dev/sda1   *  63 5237189 2618563+  83  Linux
/dev/sda2 5237190 7325639 1044225   82  Linux swap / Solaris
/dev/sda3 7325640 9414089 1044225   82  Linux swap / Solaris
/dev/sda4 9414090   321669494   156127702+   f  W95 Ext'd (LBA)
/dev/sda5 941415311502539 1044193+  83  Linux
/dev/sda61150260313590989 1044193+  83  Linux
/dev/sda71359105316723664 1566306   83  Linux
/dev/sda816820118   321669494   152424688+  83  Linux

(note the gap between sda7 and sda8 - about 96000 sectors!)

The corresponding disk_config partition sizes didn't change for years:

disk_config sda disklabel:msdos preserve_always:8 bootable:1
primary /   2560ext3rw,errors=remount-rocreateopts="-m1"
primary swap1024swaprw
primary swap1024swaprw
logical /tmp1024ext3rw,nosuid   createopts="-m0"
logical /var1024ext3rw  createopts="-m1"
logical /opt1536ext3rw  createopts="-m1"
logical /scratch1-  xfs rw

Apparently the rounding strategy has changed... intentionally?

Cheers,
 Steffen

-- 
Steffen Grunewald * MPI Grav.Phys.(AEI) * Am Mühlenberg 1, D-14476 Potsdam
Cluster Admin * http://pandora.aei.mpg.de/merlin/ * http://www.aei.mpg.de/
* e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7233,fax:7298}
No Word/PPT mails - http://www.gnu.org/philosophy/no-word-attachments.html

grub issues during FAI install

2009-07-08 Thread Steffen Grunewald

I'm using the /boot/grub/menu.lst/GRUB example that comes with FAI, and
the corresponding postinst. (Lenny, AMD64)
After fcopy'ing /boot/grub/menu.lst, I usually see the following messages
produced by the postinst script:

grub-probe: error: Cannot open `/boot/grub/device.map'
/usr/sbin/grub-install: line 374: [: =: unary operator expected
Installation finished. No error reported.
This is the contents of the device map /target/boot/grub/device.map.
Check if this is correct or not. If any of the lines is incorrect,
fix it and re-run the script `grub-install'.

(hd0)<->/dev/sda
(hd1)<->/dev/sdb
Searching for GRUB installation directory ... found: /boot/grub
Searching for default file ... found: /boot/grub/default
Testing for an existing GRUB menu.lst file ... found: /boot/grub/menu.lst
Searching for splash image ... none found, skipping ...
Found kernel: /boot/vmlinuz-2.6.26-2-amd64
Updating /boot/grub/menu.lst ... done

Grub installed on /dev/sda on (hd0,0)


Since the "line 374" error appears to be related to a missing device.map, 
and everything seems to work fine, I'm not worried too much about it.
But - I have a single machine that would cause trouble with the same
(GRUB) setup. It's got two disks as well, and should behave the same. 
It doesn't (same place):

More than one install_devices?
Usage: grub-install [OPTION] install_device
Install GRUB on your drive.

  -h, --help  print this message and exit
  -v, --version   print the version information and exit
  --root-directory=DIRinstall GRUB images under the directory DIR
  instead of the root directory
  --grub-shell=FILE   use FILE as the grub shell
  --no-floppy do not probe any floppy drive
  --force-lba force GRUB to use LBA mode even for a buggy
  BIOS
  --recheck   probe a device map even if it already exists

INSTALL_DEVICE can be a GRUB device name or a system device filename.

grub-install copies GRUB images into the DIR/boot directory specfied by
--root-directory, and uses the grub shell to install grub into the boot
sector.

Report bugs to .
Can't open /target/boot/grub/device.map
Searching for GRUB installation directory ... found: /boot/grub
Probing devices to guess BIOS drives. This may take a long time.
Searching for default file ... Generating /boot/grub/default file and setting 
the default boot entry to 0
Searching for GRUB installation directory ... found: /boot/grub
Testing for an existing GRUB menu.lst file ... found: /boot/grub/menu.lst
Searching for splash image ... none found, skipping ...
Found kernel: /boot/vmlinuz-2.6.26-2-amd64
Updating /boot/grub/menu.lst ... done

Grub installed on /dev/sda /dev/sdb on
ERROR: postinst returned code 1


- and closer examination shows that in disk_var.sh, there's a line
BOOT_DEVICE="/dev/sda /dev/sdb"
when it should read
BOOT_DEVICE="/dev/sda"
only.

For whatever reason this has happened: it shouldn't. There could be
several reasons to have multiple bootable devices (e.g. different
OSes), and they should be selectable in the BIOS. 
A (visible) complaint should be emitted instead.
Perhaps, in such cases ($BOOT_DEVICE consisting of multiple entries)
the one corresponding to $ROOT_PARTITION (or the first one) should be 
used?


Cheers,
 Steffen

-- 
Steffen Grunewald * MPI Grav.Phys.(AEI) * Am Mühlenberg 1, D-14476 Potsdam
Cluster Admin * http://pandora.aei.mpg.de/merlin/ * http://www.aei.mpg.de/
* e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7233,fax:7298}
No Word/PPT mails - http://www.gnu.org/philosophy/no-word-attachments.html

mysql-server installed by FAI?

2009-07-21 Thread Steffen Grunewald

Is there anyone around who has gathered some experience installing mysql
server within FAI? Apparently, the _existing_ debconf template
"start_on_boot" isn't obeyed, and I'm running into trouble with the
setting for the root password. 
I'd appreciate if you'd be willing to share your ideas.

Cheers,
 Steffen

Re: booting from host with multiple network cards

2009-11-23 Thread Steffen Grunewald

On Mon, Nov 23, 2009 at 03:22:30PM +0100, Michael Goetze wrote:
> Toomas Tamm wrote:
> > On Sun, Nov 22, 2009 at 01:38:16PM +0100, Thomas Lange wrote:
> > 
> >> * bootoption ethdevice: use specified network device for network booting
> >> (PXE) instead of default (being eth0). Feature contributed by Helge
> >> Wagner. Usage example: ethdevice=eth1
> > 
> > Does anyone know/remember, why the problem cannot be fixed properly,
> > i.e. make live-initramfs try all interfaces and use the first one
> > where a DHCP reply is obtained?
> > 
> > This used to be possible with fai-kernels under etch, so why is it not
> > possible with live-initramfs?
> 
> Red Hat's Kickstart also does this, I agree it would definitely be much
> nicer.
> 
> However, lacking that, the ethdevice boot option would at least be
> better than what we have now...

I suppose there is no chance to find out which interface had been used for the
actual PXE booting? 
For IPMI reasons, eth0 of multi-homed machines may be a public interface
and eth1 the "private" one which is connected to the installation network.
In such a case, the first DHCP response might be returned from the public
network even if PXE booting took place on eth1. (which has preference in
the BIOS boot order, btw)

I still don't understand why we can't have back ip=::eth1

Cheers,
 Steffen

instsoft hangs, same place for multiple nodes

2010-01-08 Thread Steffen Grunewald

I've got to re-install a couple of nodes, and started the usual FAI routine.
This time, all 5 nodes get stuck in the same place:

# tail fai.log
Get:248 http://10.100.200.98 lenny/main rstatd 4.0.1-3 [15.8kB]
Get:249 http://10.100.200.98 lenny/main rusers 0.17-7.1 [13.0kB]
Get:250 http://10.100.200.98 lenny/main rusersd 0.17-7.1 [11.0kB]
Get:251 http://10.100.200.98 packages.amd64/ smartmontools 5.38+svn091119-1sg 
[418kB]
Get:252 http://10.100.200.98 lenny/main sudo 1.6.9p17-2 [188kB]
Get:253 http://10.100.200.98 lenny/main wakeonlan 0.41-10 [11.4kB]
Get:254 http://10.100.200.98 lenny/main xfsprogs 2.9.8-1lenny1 [1376kB]
Get:255 http://10.100.200.98 lenny/main xfsdump 2.2.48-1 [306kB]
Get:256 http://10.100.200.98 lenny/main octave-plplot 5.9.0-8 [397kB]
Get:257 http://10.100.200.98 lenny/main rsh-server 0.17-14 [39.3kB]

tasks related:
root  4791  0.0  0.8  45236 17244 ttyS1S+   10:42   0:00 
install_packages
root  4792  0.0  0.0   3808   524 ttyS1S+   10:42   0:00 tee -a 
/tmp/fai/software.log
root  4794  0.1  1.5  83788 32612 ttyS1Sl+  10:42   0:03 aptitude -R -y 
-o Dpkg::Options::=--force-confdef -o Dpkg::Options::=--force-confold install 
rsh-server telnet sudo iotop ethtool
ethstatus smart
root  4798  0.1  0.0  18816  1968 ttyS1S+   10:42   0:04 
/usr/lib/apt/methods/http

# cat /proc/4794/cmdline | tr '\0' ' '
aptitude -R -y -o Dpkg::Options::=--force-confdef -o 
Dpkg::Options::=--force-confold install rsh-server telnet sudo iotop ethtool 
ethstatus smartmontools wakeonlan ipmitool parted at dash ash
rstat-client rstatd rusers rusersd module-init-tools cramfsprogs ntp ntpdate 
nfs-kernel-server exim4-daemon-light login wget w3m xfsdump xfsprogs autofs apt 
apt-doc apt-file apt-show-versions
apt-utils debian-keyring dpkg fakeroot aptitude alien g++ g++-4.1 g++-4.2 
g++-4.3 gcc gcc-4.1 gcc-4.2 gcc-4.3 gfortran gfortran-4.2 gfortran-4.3 python 
python2.4 python2.5 python2.4-minimal
python2.5-minimal python-dev python2.4-dev python2.5-dev python-doc 
python2.4-doc python2.5-doc python-gtk2 python-matplotlib 
python-matplotlib-data python-matplotlib-doc python-numarray
python-numarray-ext python-numeric python-numeric-ext python-numeric-tutorial 
python-numpy python-numpy-doc python-numpy-ext perl libterm-readline-perl-perl 
libtimedate-perl octave octave2.1 octave3.0
octave-doc octave2.1-doc octave3.0-doc octave-headers octave2.1-headers 
octave3.0-headero444:/tmp/fai# tail fai.logforge octave-plplot octave-sp 
autoconf automake automake1.4 automake1.7 automake1.9
autotools-dev make pkg-config binutils

How can I find out what's going wrong? A serial-over-LAN connection to the
BMC stops transmitting even 31 lines before Get:257 :(

No changes made to the FAI configuration space.

Any ideas?

S
-- 
Steffen Grunewald * MPI Grav.Phys.(AEI) * Am M�hlenberg 1, D-14476 Potsdam
Cluster Admin * http://pandora.aei.mpg.de/merlin/ * http://www.aei.mpg.de/
* e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7233,fax:7298}
No Word/PPT mails - http://www.gnu.org/philosophy/no-word-attachments.html

Re: instsoft hangs, same place for multiple nodes

2010-01-08 Thread Steffen Grunewald

On Fri, Jan 08, 2010 at 01:01:06PM +0100, Thomas Lange wrote:
> >>>>> On Fri, 8 Jan 2010 12:42:34 +0100, Steffen Grunewald 
> >>>>>  said:
> 
> > This time, all 5 nodes get stuck in the same place:
> 
> > # tail fai.log
> > Get:248 http://10.100.200.98 lenny/main rstatd 4.0.1-3 [15.8kB]
> > Get:249 http://10.100.200.98 lenny/main rusers 0.17-7.1 [13.0kB]
> > Get:250 http://10.100.200.98 lenny/main rusersd 0.17-7.1 [11.0kB]
> > Get:251 http://10.100.200.98 packages.amd64/ smartmontools 
> 5.38+svn091119-1sg [418kB]
> > Get:252 http://10.100.200.98 lenny/main sudo 1.6.9p17-2 [188kB]
> > Get:253 http://10.100.200.98 lenny/main wakeonlan 0.41-10 [11.4kB]
> > Get:254 http://10.100.200.98 lenny/main xfsprogs 2.9.8-1lenny1 [1376kB]
> > Get:255 http://10.100.200.98 lenny/main xfsdump 2.2.48-1 [306kB]
> > Get:256 http://10.100.200.98 lenny/main octave-plplot 5.9.0-8 [397kB]
> > Get:257 http://10.100.200.98 lenny/main rsh-server 0.17-14 [39.3kB]
> 
> 
> > How can I find out what's going wrong? A serial-over-LAN connection to 
> the
> > BMC stops transmitting even 31 lines before Get:257 :(
> 
> > No changes made to the FAI configuration space.
> Did the mirror of Debian packages changed? Sometimes I also got a

As mirrors use to...

I'm rebuilding the mirror for the arch affected now.

> stuck installation, but mostly if too many packages are installed at
> once. Therefore I set the variable MAXPACKAGES to a lower value. I use
> 500 or 300 here.

Since the download stopped at 257, I'll set this to 100 or 200.

> IMO this is a bug in apt, since the download stucks, not FAI itself.

Any chance to get debugging output?

Steffen

-- 
Steffen Grunewald * MPI Grav.Phys.(AEI) * Am M�hlenberg 1, D-14476 Potsdam
Cluster Admin * http://pandora.aei.mpg.de/merlin/ * http://www.aei.mpg.de/
* e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7233,fax:7298}
No Word/PPT mails - http://www.gnu.org/philosophy/no-word-attachments.html

Re: instsoft hangs, same place for multiple nodes

2010-01-22 Thread Steffen Grunewald

On Fri, Jan 08, 2010 at 01:01:06PM +0100, Thomas Lange wrote:
> >>>>> On Fri, 8 Jan 2010 12:42:34 +0100, Steffen Grunewald 
> >>>>>  said:
> 
> > This time, all 5 nodes get stuck in the same place:
> 
> > # tail fai.log
> > Get:248 http://10.100.200.98 lenny/main rstatd 4.0.1-3 [15.8kB]
> > Get:249 http://10.100.200.98 lenny/main rusers 0.17-7.1 [13.0kB]
> > Get:250 http://10.100.200.98 lenny/main rusersd 0.17-7.1 [11.0kB]
> > Get:251 http://10.100.200.98 packages.amd64/ smartmontools 
> 5.38+svn091119-1sg [418kB]
> > Get:252 http://10.100.200.98 lenny/main sudo 1.6.9p17-2 [188kB]
> > Get:253 http://10.100.200.98 lenny/main wakeonlan 0.41-10 [11.4kB]
> > Get:254 http://10.100.200.98 lenny/main xfsprogs 2.9.8-1lenny1 [1376kB]
> > Get:255 http://10.100.200.98 lenny/main xfsdump 2.2.48-1 [306kB]
> > Get:256 http://10.100.200.98 lenny/main octave-plplot 5.9.0-8 [397kB]
> > Get:257 http://10.100.200.98 lenny/main rsh-server 0.17-14 [39.3kB]
> 
> 
> > How can I find out what's going wrong? A serial-over-LAN connection to 
> the
> > BMC stops transmitting even 31 lines before Get:257 :(
> 
> > No changes made to the FAI configuration space.

> Did the mirror of Debian packages changed? Sometimes I also got a
> stuck installation, but mostly if too many packages are installed at
> once. Therefore I set the variable MAXPACKAGES to a lower value. I use
> 500 or 300 here.

Fixed by both re-building the mirror _and_ setting the limit to 200,
not sure what did the trick.

Thanks!


-- 
Steffen Grunewald * MPI Grav.Phys.(AEI) * Am M�hlenberg 1, D-14476 Potsdam
Cluster Admin * http://pandora.aei.mpg.de/merlin/ * http://www.aei.mpg.de/
* e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7233,fax:7298}
No Word/PPT mails - http://www.gnu.org/philosophy/no-word-attachments.html

Re: setup-storage: Cannot determine size of /dev/cciss/c1d0 - scheme unknown

2010-05-06 Thread Steffen Grunewald

On Thu, May 06, 2010 at 01:45:48PM +0200, Mathieu Alorent wrote:
> Le jeudi 06 mai 2010 à 11:35 +0200, Mathieu Alorent a écrit :
> > "Cannot determine size of /dev/cciss/c1d0 - scheme unknown"
> 
> Found it !
> 
> here is the patch... Hope there is no side effects.
> 
> --- usr/share/fai/setup-storage/Init.pm.orig  2010-05-06 13:37:32.0 
> +0200
> +++ usr/share/fai/setup-storage/Init.pm   2010-05-06 13:36:19.0 
> +0200
> @@ -182,7 +182,7 @@
>  return (1, "/dev/$1", $2);
>}
>elsif ($dev =~
> -
> m{^/dev/(cciss/c\dd\d|ida/c\dd\d|rd/c\dd\d|ataraid/d\d|etherd/e\d+\.\d+)p(\d+)?$})
> +
> m{^/dev/(cciss/c\dd\d|ida/c\dd\d|rd/c\dd\d|ataraid/d\d|etherd/e\d+\.\d+)p?(\d+)?$})
>{
>  defined($2) or return (1, "/dev/$1", -1);
>  return (1, "/dev/$1", $2);

I suppose the line should read

> +
> m{^/dev/(cciss/c\dd\d|ida/c\dd\d|rd/c\dd\d|ataraid/d\d|etherd/e\d+\.\d+)(p\d+)?$})

Seems to make more sense?

Steffen

-- 
Steffen Grunewald * MPI Grav.Phys.(AEI) * Am M�hlenberg 1, D-14476 Potsdam
Cluster Admin * http://pandora.aei.mpg.de/merlin/ * http://www.aei.mpg.de/
* e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7233,fax:7298}
No Word/PPT mails - http://www.gnu.org/philosophy/no-word-attachments.html

Re: setup-storage: Cannot determine size of /dev/cciss/c1d0 - scheme unknown

2010-05-06 Thread Steffen Grunewald

On Thu, May 06, 2010 at 02:42:00PM +0200, Mathieu Alorent wrote:
> Are you sure ?
> 
> $2 will then contains "p1" or "p2"... instead of just the number !

You're right, but what sense would it make to a second digit from the
d part then?

S

$MAXPACKAGES

2010-12-16 Thread Steffen Grunewald

In the past few days, I had (after some additions to the mirrors) to reinstall
a couple of nodes, and all of them stopped dead after downloading 390 packages.
MAXPACKAGES, in FAIBASE.var, was set to 200.
Reducing MAXPACKAGES to 150 resulted in package counts going up to 324, but
the installation finally succeeded.
I have set MAXPACKAGES to 100 for future installs now, to be on the safe side,
but I'm still worried - is this behaviour intentional? From a naive point of
view, I'd expect the installer to cut the package list into chunks of 
maximally $MAXPACKAGES items, even after resolving dependencies... What
would happen if there was an "include everything for a certain project"
metapackage? Would even MAXPACKAGES=1 (or close to 1) fail to install?

Cheers,
 Steffen

Re: $MAXPACKAGES

2010-12-16 Thread Steffen Grunewald

On Thu, Dec 16, 2010 at 01:23:03PM +0200, Toomas Tamm wrote:
> 
> You can move the apt cache to a NFS-mounted disk with enough space.
> A gigabit ethernet connection is highly recommended.
> 
>Also, only one installation may be in progress at
> any given time: the install cache is shared between the hosts.

Bad idea.
My typical situation is tens of install clients.
And I don't see the point of reading files via the network (HTTP, or NFS)
just to write them back over the network.

On Thu, Dec 16, 2010 at 12:41:25PM +0100, Thomas Lange wrote:
> If you use NFS access to your Debian mirror, the packages will not need
> any space under /var.

That's a good point. Now if I only could remember why I didn't do this before...

Cheers
 Steffen

Package alternatives?

2011-05-25 Thread Steffen Grunewald

We're in the process of replacing some packages ("A") by specially built 
ones ("B"), which will be named slightly differently, to avoid confusion.
For FAI this means that we want to
"install package B if it already exists, otherwise use A".
How can this be achieved?

Cheers,
 Steffen

-- 
Steffen Grunewald * MPI Grav.Phys.(AEI) * Am Mühlenberg 1, D-14476 Potsdam
Cluster Admin * http://pandora.aei.mpg.de/merlin/ * http://www.aei.mpg.de/
* e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7274,fax:7298}

Additional checks in sysinfo?

2011-11-10 Thread Steffen Grunewald

Hi,

I'd like to run a couple more checks in fai-sysinfo.
Is there an alternative to putting everything into class/* scripts?
(I'd have to load a couple modules, but don't want them to interfere
with subsequent install steps... and rmmod perhaps won't catch all 
dependencies pulled in by modprobe.)
Is there something I can check for so the corresponding code will be
run only in sysinfo mode?

Cheers,
 Steffen

-- 
Steffen Grunewald * MPI Grav.Phys.(AEI) * Am Mühlenberg 1, D-14476 Potsdam
Cluster Admin * - * http://www.aei.mpg.de/
* e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7274,fax:7298}

Re: "clone" an example client?

2011-12-08 Thread Steffen Grunewald

On Wed, Dec 07, 2011 at 05:15:15PM +0100, Frank Lienhard wrote:
> I have an i386 client, which I installed manually and I'm now
> wondering how to setup future clients "identical" to that with FAI
> 
> at least to have the package selection transfered to the FAI config
> would save a lot of work, I think

Use debtree to identify the "leaf packages" so you don't have to put
all and everything into the package list.

Identify the files (supposedly mainly in /etc) which have been modified
during or after installation, and convert them into a fcopy-able tree.

S

-- 
Steffen Grunewald * MPI Grav.Phys.(AEI) * Am Mühlenberg 1, D-14476 Potsdam
Cluster Admin * - * http://www.aei.mpg.de/
* e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7274,fax:7298}

Re: gpt-bios from setup-storage - does it really work?

2011-12-20 Thread Steffen Grunewald

On Tue, Dec 20, 2011 at 01:37:24PM +0100, Carsten Aulbert wrote:
> Hi
> 
> I'm trying to partition a machine with the following disk configuration:
> 
> disk_config /dev/disk/by-id/scsi-3600050e0f065*[A-Z0-9_][A-Z0-9_][A-Z0-9_] 
> fstabkey:uuid disklabel:gpt-bios bootable:1
> 
> primary  /boot 256  ext2 rw,errors=remount-ro 
> primary  / 1   ext3  rw
> primary  swap  8192swap sw
> primary  /var 2xfsrw
> primary  /opt 5xfsrw
> primary  /tmp 2048 xfsrw
> primary  /local1000-   xfs rw
> 
> all seems fine, but GRUB won't start after the reboot as is not able to reach 
> the gpt-bios "partition":
> 
> GNU Parted 2.3
> Using /dev/sdb
> Welcome to GNU Parted! Type 'help' to view a list of commands.
> (parted) p
> Model: AMCC 9690SA-4I DISK (scsi)
> Disk /dev/sdb: 3000GB
> Sector size (logical/physical): 512B/512B
> Partition Table: gpt
> 
> Number  Start   End SizeFile system Name Flags
>  1  32.3kB  268MB   268MB   ext2primary  boot

32.3KB - does this mean "63 sectors"? AFAIK a GPT needs 34 sectors,
both at the beginning and the end of the disk...

>  2  268MB   10.8GB  10.5GB  ext3primary
>  3  10.8GB  19.3GB  8590MB  linux-swap(v1)  primary
>  4  19.3GB  40.3GB  21.0GB  xfs primary
>  5  40.3GB  92.7GB  52.4GB  xfs primary
>  6  92.7GB  94.9GB  2147MB  xfs primary
>  7  94.9GB  3000GB  2905GB  xfs primary
>  8  3000GB  3000GB  123kB   primary  bios_grub
> 
> as this one is beyond the 2TB "border"...
> 
> Is there a nice way to "reserve" this partition right at the beginning?

What about splitting the large volume into two at the raid controller level,
and having a "normal" boot partition (and a "normal" grub installation at
the beginning of a small-sized volume)?

S

Re: NIS client installation with FAI

2012-07-20 Thread Steffen Grunewald

On Fri, Jul 20, 2012 at 09:45:39AM +0200, Katarzyna Myrek wrote:
> Hi
> 
> I managed to install NIS with NFS via FAI class. Here are files I am using:
> 
> 1. /srv/fai/config/files/root/nis_install
> nis nis/domain  string  YOUR_NIS_DOMAIN
> 
> This file will help install package nis (holding off any prompts in post
> install aptitude scripts).
> 
> 2. Script included in class which installs nis:
> #! /bin/bash
> # © 2012 Katarzyna "Olivia" Myrek
> 
> if [ $FAI_ACTION = "install" ]; then
> fcopy -BMi /root/nis_install
> echo "Copied nis_install"
> $ROOTCMD debconf-set-selections /root/nis_install
> $ROOTCMD aptitude install -y nis
> rm /target/root/nis_install
> ##Those lines need a little security tweaking
> echo "NFSSERVER:/export/home/home   nfs
> defaults,rw,nodev,nosuid,rsize=32768,wsize=32768,_netdev,tcp 0 0" >>
> /target/etc/fstab
> echo "+::" >> /target/etc/passwd
> echo "+::" >> /target/etc/shadow
> echo "+::" >> /target/etc/group
> echo "+::" >> /target/etc/gshadow

Wouldn't a "compat" antry in /etc/nsswitch.conf mean the same as the
4 lines above?

> fcopy -BMi /etc/defaultdomain
> fcopy -BMi /etc/default/nis
> fcopy -BMi /etc/yp.conf
> fcopy -BMi /etc/nsswitch.conf
> fcopy -BMi /etc/profile
> fcopy -BMi /var/yp/Makefile ##not needed?
> fcopy -BMi /usr/local/bin/passwd

BTW, /target should better read ${target} to allow softupdates...

> 
> fi
> 
> I hope this will someday help someone install nis via class;).
> 
> 
> Regards,
> Katarzyna Myrek

-- 
Steffen Grunewald * MPI Grav.Phys.(AEI) * Am Mühlenberg 1, D-14476 Potsdam
Cluster Admin * - * http://www.aei.mpg.de/
* e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7274,fax:7298}

fai-guide for 4.0.3?

2012-08-01 Thread Steffen Grunewald

The fai-doc packages on jenkins and in the official wheezy repositories 
still contain documentation for 3.0.6 - will this be changed? I'm not 
aware of a corresponding bug files against fai-doc...

S

-- 
Steffen Grunewald * MPI Grav.Phys.(AEI) * Am Mühlenberg 1, D-14476 Potsdam
Cluster Admin * - * http://www.aei.mpg.de/
* e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7274,fax:7298}

Re: Subject: Wheezy nfsroot / FAI 4.0.3

2012-08-07 Thread Steffen Grunewald

On Fri, Jul 20, 2012 at 07:51:01AM +0200, Bjarne Bertilsson wrote:
> 
> live-boot3.0(~a35-1) is broken on wheezy. If you run ipconfig in the
> initramfs shell the network will properly come up. There is
> a bug report on this somewhere but weren't able to find it just now
> with a quick search. I fixed it temporarily by pinning an earlier
> version of the package from snapshot.debian.org.

I'm trying the two patches mentioned in bug #683240 now - as long as there's
no proper documentation for the live-boot -> dracut transition, I prefer to
stay on known terrain.

Unfortunately, this involved patching the existing live-boot package (a38), 
rebuilding the debs, and re-running the nfsroot build - which will not 
finish in time before I have to leave. But I'm optimistic.

Steffen

Re: wheezy fai sysinfo reboots without reboot flag

2012-08-14 Thread Steffen Grunewald

On Tue, Aug 14, 2012 at 02:11:30PM +0200, Mathieu Alorent wrote:
> Le 13/08/2012 21:33, Thomas Neumann a écrit :
> >On Mon, 13 Aug 2012 12:14:19 -0500
> >Brian Kroth  wrote:
> >>Hi all, I'm testing fai+wheezy (4.0.3) and have a client setup to pxeboot 
> >>to a
> >>sysinfo without the reboot flag[...]
> >Check the list for an earlier message of mine. I've run into the same 
> >problem.
> >
> >If no error occurs, then fai _will_ reboot the system. You can tell fai to 
> >halt instead, but you can't simply make it stop and do nothing. Reboot or 
> >Halt.
> >
> >Solution:
> >a) provoke an error somewhere during execution
> >b) use my attached patch to force a non-reboot
> >
> >bye
> >thomas
> I got the same problem, I added a new flag "noreboot" in subroutines...
> 
> https://github.com/kumy/fai/commit/ef3d68991c69b09343b110c8d8e857501b588f28

So this creates a kind of ternary logic: reboot, noreboot, and something
in between. Which confuses me.
I'm still trying to understand why the reboot logic was changed since fai 3.x
- which just slept for a while if there was an error (or something that looked
like one), and after that did as told (reboot if flag "reboot" was set, or
sit there if not).
It shouldn't have been too hard to integrate the new "halt" flag into this,
so I suppose there was a reason to "deliberately break" the old behaviour...

Steffen

Re: wheezy fai sysinfo reboots without reboot flag

2012-08-15 Thread Steffen Grunewald

On Tue, Aug 14, 2012 at 05:24:44PM +0200, Thomas Neumann wrote:
> Hello
> 
> On Tue, 14 Aug 2012 16:31:44 +0200
> Steffen Grunewald  wrote:
> > I'm still trying to understand why the reboot logic was changed since fai 
> > 3.x
> > - which just slept for a while if there was an error (or something that 
> > looked
> > like one), and after that did as told (reboot if flag "reboot" was set, or
> > sit there if not).
> 
> I checked the recent 3.x versions and it wasn't changed. However at least for 
> me there always occured an error regarding the pcspeaker module, so if I 
> didn't specify 'reboot' it would wait for a confirmation ('Press Enter to 
> continue').

Okay, my /usr/lib/fai/subroutines for squeeze is dated Dec 24, 2010.
I'm too lazy to find the commit that changed the behaviour - but my error.log
files from squeeze installs are about 10k in size, most of the messages being
non-critical. I certainly won't press ENTER for 600 machines.

S

Re: Subject: Wheezy nfsroot / FAI 4.0.3

2012-08-17 Thread Steffen Grunewald

On Tue, Aug 07, 2012 at 07:04:16PM +0200, Steffen Grunewald wrote:
> On Fri, Jul 20, 2012 at 07:51:01AM +0200, Bjarne Bertilsson wrote:
> > 
> > live-boot3.0(~a35-1) is broken on wheezy. If you run ipconfig in the
> > initramfs shell the network will properly come up. There is
> > a bug report on this somewhere but weren't able to find it just now
> > with a quick search. I fixed it temporarily by pinning an earlier
> > version of the package from snapshot.debian.org.
> 
> I'm trying the two patches mentioned in bug #683240 now - as long as there's
> no proper documentation for the live-boot -> dracut transition, I prefer to
> stay on known terrain.
> 
> Unfortunately, this involved patching the existing live-boot package (a38), 
> rebuilding the debs, and re-running the nfsroot build - which will not 
> finish in time before I have to leave. But I'm optimistic.

I shouldn't have been too optimistic.

While the network device config patches apparently worked, I'm still having
major issues.

- My PXE files read like this:
label fai-generated
kernel vmlinuz-install.64.wheezy


   
append initrd=initrd-install.64.wheezy ip=dhcp \
 root=/dev/nfs nfsroot=/srv/fai/nfsroots/amd64.wheezy \
 boot=live console=tty0 \
 FAI_FLAGS=verbose,sshd,createvt FAI_ACTION=install
(of course, all the append stuff is in one line)

- The TFTP/DHCP setup hasn't been changed for a long time

- Watching a machine start via IPMI Serial-over-LAN will stop the start
at some point. Apparently the network port (through which the IPMI
conenction is tunneled) gets confused at some point.
  I had to remove the console=ttyS1,... entry from the PXE file.

- I am thrown into initramfs after some errors scrolling by. Here's what
I see (had to copy it to paper by hand, for the above reason):
...
Begin: Loading essential drivers...
Begin: Running /scripts/init-premount
Begin: Mounting root file system...
Waiting for ethernet card(s) up
Looking for a conencted Ethernet interface ... eth0? eth1?
... eth0 becomes ready
Connected eth0 found
IP-Config: ... (correct IP, DNS, etc)
 rootserver: ${my_fai_server} rootpath: (empty!)
 filenam: pxelinux.0
Creating /etc/resolv.conf
Begin: Trying netboot from :/srv/...
Begin: Trying nfsmount -o nolock -o ro :/srv/...
nfsmount: can't parse IP address ''

In the initramfs, I checked:
/run/net-eth0.conf contains the right ROOTSERVER setting, and PROTO='dhcp',
but an empty ROOTPATH=''
/conf/param.conf says DEVICE=' eth0' (yes, with a leading blank)
/conf/initramfs.conf contains DEVICE='' and NFSROOT=auto


Something seems to stop propagation of the ROOTSERVER setting to the 
9990-netboot.sh script of live-boot... (and ROOTPATH not being used)

I'd like to make local modifications to the startup sequence (adding debug
statements), and rebuild the initrd from inside the nfsroot 
- any tips how to do this, without having to *build* a new live-boot package,
and creating a new nfsroot?
- any suggestions which additional settings to use?

S

-- 
Steffen Grunewald * MPI Grav.Phys.(AEI) * Am Mühlenberg 1, D-14476 Potsdam
Cluster Admin * - * http://www.aei.mpg.de/
* e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7274,fax:7298}

Re: Subject: Wheezy nfsroot / FAI 4.0.3

2012-08-20 Thread Steffen Grunewald

On Fri, Aug 17, 2012 at 05:55:37PM +0200, Thomas Lange wrote:
> > On Fri, 17 Aug 2012 17:47:17 +0200, "Andreas B. Mundt" 
> >  said:
> 
> > After that, things started working.  However, fai fails with "No URL
> > defined for config space" ...
> I guess you did not defined FAI_CONFIG_SRC on the kernel command line.
> Use fai-chboot -u for this.
> 
> > The problem might be solved by adding the appropriate line in fai.conf
> > in the chroot. 
> This is deprecated and not supported any more.

In this case, the manpage of fai-chboot should be changed: it still suggests 
that

-u URL Set FAI_CONFIG_SRC to URL. If not set the value from fai.conf inside the 
nfsroot will be used.

S

Re: Subject: Wheezy nfsroot / FAI 4.0.3

2012-08-21 Thread Steffen Grunewald

On Mon, Aug 20, 2012 at 04:17:06PM +0200, Sven Ulland wrote:
> On 08/20/2012 04:07 PM, Brian Kroth wrote:
> >All that said, I still had to apply this patch to get the aufs
> >mounting order stuff to work out correctly:
> >http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=681579
> 
> I worked around this by simply specifying mount option nfsvers=3 on
> in the pxe config (no authoritative sources of aufs/nfsv4/kernel bugs
> on this, sorry!). So the full line would be:
> 
> label fai-generated
> kernel vmlinuz-3.2.0-3-amd64
> append initrd=initrd.img-3.2.0-3-amd64 ip=dhcp aufs
> root=nfs:1.1.1.1:/srv/fai/nfsroot-amd64-wheezy:nfsvers=3
> FAI_FLAGS=verbose,sshd,reboot FAI_ACTION=install
> 
> This and the fai-make-nfsroot patch in
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=682013#20 seems to be
> all I needed to get a plain wheezy fai server installed and ready for
> installing wheezy clients.

Thanks for this suggestion. I started a fresh attempt, with
- nfs-common installed to the nfsroot (had been there for a while)
- live-boot 3.0~b1 replacing patched ~a38
- /etc/fai/fai/conf copied into nfsroot
- /usr/share/fai/subroutines patched to deliberately ignore errors
- your pxe config above

- and failed.
I had to use the following pxe config:

default fai-generated   


   



   
label fai-generated 


   
kernel vmlinuz-install.64.wheezy


   
append initrd=initrd-install.64.wheezy ip=dhcp boot=live root=/dev/nfs 
nfsroot=10.100.200.98:/srv/fai/nfsroots/amd64.wheezy 
FAI_FLAGS=verbose,sshd,createvt,reboot FAI_ACTION=install

(no aufs; no nfsvers; boot=live seems to be essential while console=tty0
doesn't, with the exception of the server IP it's pretty close to my
previous squeeze setup)

and with some minor exceptions (one of my old scripts uses discover, which
isn't there anymore, and old, previously used disk layouts don't match so
the partition step ends with error 710 - stuff which I will look into later)

*it works*!

I don't like the IP address of the server in the pxe file, but things could
be worse.

Going to check the 71 lines of error.log now...


Steffen

Re: Subject: Wheezy nfsroot / FAI 4.0.3

2012-08-22 Thread Steffen Grunewald

On Tue, Aug 21, 2012 at 03:10:11PM +0200, Steffen Grunewald wrote:
>
> Thanks for this suggestion. I started a fresh attempt, with
> - nfs-common installed to the nfsroot (had been there for a while)
> - live-boot 3.0~b1 replacing patched ~a38
> - /etc/fai/fai/conf copied into nfsroot
> - /usr/share/fai/subroutines patched to deliberately ignore errors
> - your pxe config above
>
> - and failed.
> I had to use the following pxe config:
>
> default fai-generated
>
> label fai-generated
> kernel vmlinuz-install.64.wheezy
> append initrd=initrd-install.64.wheezy ip=dhcp boot=live root=/dev/nfs 
> nfsroot=10.100.200.98:/srv/fai/nfsroots/amd64.wheezy 
> FAI_FLAGS=verbose,sshd,createvt,reboot FAI_ACTION=install
>
> (no aufs; no nfsvers; boot=live seems to be essential while console=tty0
> doesn't, with the exception of the server IP it's pretty close to my
> previous squeeze setup)
>
> *it works*!
>
> Going to check the 71 lines of error.log now...

I managed to get my error.log down to zero size, but there's still an
issue I had discovered before: "something" is blocking outside access
to the IPMI interface.
With a SOL session running, I wouldn't get beyond the very initial steps
of system initialisation (last things I saw were HD detection related),
so it probably isn't a FAI issue but something related to the initrd 
and/or tg3 driver.
After rebooting into the default Wheezy kernel, I can access the IPMI
via the host interface, but not using lanplus. I cannot even ping the
IP (which has been properly set during FAi setup, and again by rc.local
- confirmed by "ipmitool lan print 1" and messages sent to syslog).

Anyone else seen this behaviour? Any hints how to fix this, if possible?

S

IPMI problem with FAI and wheezy

2012-09-07 Thread Steffen Grunewald

Hi,

I'm at my wits' end now with this old system, perhaps one of you can come
up with another idea:

The hardware is somewhat old, SuperMicro H8SSL board with IPMI card (BMC)
looped into eth0 (Broadcom Tigon3).

Excerpts from the demsg file:
[0.00] Linux version 3.2.0-3-amd64 (Debian 3.2.23-1) 
(debian-ker...@lists.debian.org) (gcc version 4.6.3 (Debian 4.6.3-8) ) #1 SMP 
Mon Jul 23 02:45:17 UTC 2012
[0.00] ACPI: FACP 7ffe0290 000F4 (v03 A M I  OEMFACP  12000606 
MSFT 0097)
[0.00] ACPI: DSDT 7ffe0410 033A8 (v01  0ABSW 0ABSW005 0005 
INTL 02002026)
[0.884954] tg3 :02:03.0: eth0: Tigon3 [partno(BCM95704A6) rev 2100] 
(PCIX:133MHz:64-bit) MAC address xx:xx:xx:xx:xx:xx

I used to set "console=ttyS1,19200n1" in the pxelinux.cfg file, and watch
FAI running via serial-over-LAN, but that stops right at the beginning -
and the IPMI card cannot be reached afterwards, not by rebooting, nor by
applying other tricks. The only way to get the connection back is power-
cycling the whole box.

This behaviour did not show up with Squeeze (2.6.32-5 kernel).

I'm suspecting a change in the handling of the eth0/BMC bridge by the tg3
driver, but that's only part of the story: it gets worse.

Trying to shut down the machine (actually, a whole set of machines, all
behaving the same, so it's not a single fault), by running "shutdown -h now", 
will not halt but reboot it.
The only way to reliably switch it off seems to be to run "ipmitool chassis
power soft", then "shutdown -h now".
The machine will then stay off for exactly 24 hours, then magically restart.

Needless to say I didn't change any BIOS settings, nor implemented kind of
a watchdog on the BMC.

Is there anything I can do to nail down the problem?

Thank you in advance for your suggestions.

Steffen

Re: IPMI problem with FAI and wheezy

2012-11-12 Thread Steffen Grunewald

On Fri, Sep 07, 2012 at 03:02:05PM +0200, Steffen Grunewald wrote:
> Hi,
> 
> I'm at my wits' end now with this old system, perhaps one of you can come
> up with another idea:
> 
> The hardware is somewhat old, SuperMicro H8SSL board with IPMI card (BMC)
> looped into eth0 (Broadcom Tigon3).
> 
> Excerpts from the demsg file:
> [0.00] Linux version 3.2.0-3-amd64 (Debian 3.2.23-1) 
> (debian-ker...@lists.debian.org) (gcc version 4.6.3 (Debian 4.6.3-8) ) #1 SMP 
> Mon Jul 23 02:45:17 UTC 2012
> [0.00] ACPI: FACP 7ffe0290 000F4 (v03 A M I  OEMFACP  
> 12000606 MSFT 0097)
> [0.00] ACPI: DSDT 7ffe0410 033A8 (v01  0ABSW 0ABSW005 
> 0005 INTL 02002026)
> [0.884954] tg3 :02:03.0: eth0: Tigon3 [partno(BCM95704A6) rev 2100] 
> (PCIX:133MHz:64-bit) MAC address xx:xx:xx:xx:xx:xx
> 
> I used to set "console=ttyS1,19200n1" in the pxelinux.cfg file, and watch
> FAI running via serial-over-LAN, but that stops right at the beginning -
> and the IPMI card cannot be reached afterwards, not by rebooting, nor by
> applying other tricks. The only way to get the connection back is power-
> cycling the whole box.
> 
> This behaviour did not show up with Squeeze (2.6.32-5 kernel).
> 
> I'm suspecting a change in the handling of the eth0/BMC bridge by the tg3
> driver, but that's only part of the story: it gets worse.

Actually, the problem has gone away with the latest (3.2.32 vs 3.2.23) kernel
now available for Wheezy.

S

Re: install to usb mass storage

2013-03-13 Thread Steffen Grunewald

On Wed, Mar 13, 2013 at 08:32:26AM +0100, Natxo Asenjo wrote:
> On Tue, Mar 12, 2013 at 4:03 PM, Thomas Lange
>  wrote:
> >> On Tue, 12 Mar 2013 16:00:03 +0100, Natxo Asenjo 
> >>  said:
> >
> > > Kickstarting centos works to usb mass storage, FAI stops.
> > > /proc/partitions has the right info, but it stops.
> > Does /tmp/fai/variables.log contains a line like this?
> > disklist='sda '
> 
> no, the variable is not defined. That must be it.
> 
> How does it get defined? Obviously the kernel sees sda (probably after
> the variables get filled).

Get the usb-storage module loaded earlier, before the disks are counted?

S

Re: Setup-storage use space between partitions

2013-05-03 Thread Steffen Grunewald

On Thu, May 02, 2013 at 05:16:16PM +0200, Steven Wend wrote:
> Hello guys,
> 
> I have only one hdd which hast three partitions as shown below.
> 
> -
> | 1. NTFS part | 2. NTFS part  |  2. NTFS part  |
> -
> 
> Now I want to keep the first and the third partition. The second one can
> has to be deleted and the upcoming free space should be used to install
> Ubuntu with FAI.
> 
> I tried to handle this with this disc_config:
> 
> disk_config disk1 preserve_always:1,3 disklabel:msdos bootable:1
> fstabkey:uuid
> 
> primary /  16000  ext3  rw,noatime,errors=remount-ro
> logical swap   4000   swap  rw
> logical /home  1-50%  ext3  rw,noatime,nosuid,nodev createopts="-L
> home -m 1" tuneopts="-c 0 -i 0"
> 
> 
> The first and third partition is marked with preserve_always.
...
> Is this a bug or do I have a logical problem?

"logical" hits the nail on its head - your partitions are not numbered
1, 2, 3 - but (very probably) 1, 5, 6 (with an "extended" partition 
sitting in between, supposedly at 2). Check with "fdisk -lu" (from 
inside a "sysinfo" run, for example)

S

Re: Setup-storage use space between partitions

2013-05-03 Thread Steffen Grunewald

On Fri, May 03, 2013 at 11:39:01AM +0200, Steven Wend wrote:
> Hello Steffen,
> 
> thanks for your answer. First I had to update my "picture":
> -
> | 1. NTFS part | 2. NTFS part  |  3. NTFS part  |
> |PRIMARY   |PRIMARY|PRIMARY |
> -
> 
> The initial table of fstab is:
> ---
> Disk /dev/sda: 171.8 GB, 171798691840 bytes
> 255 heads, 63 sectors/track, 20886 cylinders, total 335544320 sectors
> Units = sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disk identifier: 0xaad98181
> 
>Device Boot  Start End  Blocks   Id  System
> /dev/sda12048   268437503   1342177287  HPFS/NTFS/exFAT
> /dev/sda2   268437504   334899199332308487  HPFS/NTFS/exFAT
> /dev/sda3   *   334899200   335513599  3072007  HPFS/NTFS/exFAT
> ---

Okay, so you got it the other way 'round - with three _primary_
partitions.
Wouldn't that require that you list them in your disk_config as such
then, not "logical"?

S

Re: Installing fai-server Without isc-dhcp-server

2013-07-11 Thread Steffen Grunewald

On Wed, Jul 10, 2013 at 09:51:16PM -0400, n43w79 wrote:
> Q. Is there a way NOT to install isc-dhcp-server as I already have dnsmasq on 
> our network.

Install it, but leave it unconfigured if there's a dnsmasq running somewhere 
else?

S

Re: Using a newer kernel (than wheezy) and dracut

2013-11-05 Thread Steffen Grunewald

On Mon, Nov 04, 2013 at 09:49:09PM +0100, Carsten Aulbert wrote:
> 
> however, installations always stopped prior to mounting the NFSroot with
> "mount.nfs: Protocol not supported". It seems that one needs to manually
> add the kernel drivers nfsv{2,3,4} among others when using a kernel 3.6
> or later *or* upgrade to a much newer dracut than the one in wheezy
> (020) [2].
> 
> A simple backport of the current 034 version from jessie works fine as well.

As live-boot seems to resolve the "multiple interfaces with DHCP" issue,
I haven't migrated to dracut (yet). It turns out that, to install the
current sid/jessie kernel (3.11-1), it's sufficient to also upgrade
the initramfs-tools (to 0.110 or higher, Wheezy is at 0.109).
No backport needed at all.
Checking the initrd contents, I found all the nfsv[234] modules, as
well as aufs and other stuff.
Hopefully that's all that's needed to get FAI running even with newer
kernels. (Of course, one has to provide the linux-image and initramfs-tools
packages for installation.)

> Hopefully this will help others to save some time ;)

It would have saved even more time if there wasn't a new Intel eth i/f
every week...

- S

Re: fai ssh logs

2013-12-04 Thread Steffen Grunewald

On Wed, Dec 04, 2013 at 02:00:16PM +0100, linux-service.be bvba wrote:
> I am not able to get the logs via ssh from client to server anymore.
> It use to work earlier.
> What is the correct procedure, or can I add a script in the config space

As there's a multitude of possible reasons, the probably best way
to find out is to let FAI stop after running (do not use "halt" or 
"reboot" in FAI_FLAGS), and run "ssh -vvv fai@$faiserver" by hand.
(I have seen: disappeared fai account; wrong permissions of .ssh/;
missing known_hosts entries (*); ...)

(*) Perhaps it'd make sense to run ssh without strict hosts checking?

- S

Re: fai ssh logs

2013-12-04 Thread Steffen Grunewald

On Wed, Dec 04, 2013 at 02:26:14PM +0100, linux-service.be bvba wrote:
> Maybe it's my own fault, but do I always have to rebuild nfsroot after a 
> change in the config space?

No. Use the FAI_CONFIG_SRC variable in your PXE file like this:
  FAI_CONFIG_SRC=nfs://faiserver/srv/fai/faiconfig.xyz

- S

Select package version in package_config?

2014-01-20 Thread Steffen Grunewald

I've been trying (and failed) to install Debian Jessie via FAI,
with both Wheezy and Jessie repositories listed in sources.list.

For some - yet unknown - reason, grub-pc installed itself, but
was unable to load the (also properly created) initrd (the kernel
loaded fine, but complained about "bad format" of /sbin/init).

With Wheezy only, and the grub* versions locked to what Wheezy 
offers (1.99-27+deb7u2) the problem disappeared - but upgrading
the machine manually to Jessie was a time-consuming task.
I'd love to let FAI do *all* the work.
Until the reason of the grub problem has been identified, I'd like
to set the version for all 5 grub* packages to 1.99-27+deb7u2 
explicitly (or, to cover the chance of the Wheezy package being
updated again, to << 2.0) - is this possible within FAI?

- S

-- 
Steffen Grunewald * Cluster Admin * steffen.grunewald(*)aei.mpg.de
MPI f. Gravitationsphysik (AEI) * Am Mühlenberg 1, D-14476 Potsdam
http://www.aei.mpg.de/ * --- * +49-331-567-{fon:7274,fax:7298}

Re: Select package version in package_config?

2014-02-14 Thread Steffen Grunewald

On Wed, Jan 22, 2014 at 01:07:56PM +0100, Thomas Lange wrote:
> >>>>> On Mon, 20 Jan 2014 13:03:42 +0100, Steffen Grunewald 
> >>>>>  said:
> 
> > Until the reason of the grub problem has been identified, I'd like
> > to set the version for all 5 grub* packages to 1.99-27+deb7u2 
> > explicitly (or, to cover the chance of the Wheezy package being
> > updated again, to << 2.0) - is this possible within FAI?
> You can do this with apt pinning, but don't aks me about the details.
> It's also possible to say packagename/distribution (like
> grub-pc/wheezy) in the package_config files.

I've just come across another quirk which needs unconventional handling:
it's the hdf5 package set which comes without libhdf5-serial-dev.
The accepted workaround is to install that package from a Squeeze repo,
put it on hold, and install a tweaked (build -9.1) version of Wheezy's
libhdf5-7 over it.

I presume that the most adequate way to represent this in FAI would be
a hook, right?
When would this best happen, just after unpacking the base package? 
Would FAI obey the "hold" setting?

Thanks,
- S

XFS with external log partition?

2014-03-06 Thread Steffen Grunewald

Has anyone setup a (big) xfs using an external log partition, with
FAI - and would like to share the disk_config?

Thanks,
 Steffen

-- 
Steffen Grunewald * Cluster Admin * steffen.grunewald(*)aei.mpg.de
MPI f. Gravitationsphysik (AEI) * Am Mühlenberg 1, D-14476 Potsdam
http://www.aei.mpg.de/ * --- * +49-331-567-{fon:7274,fax:7298}

Re: Can a hook abort a FAI install?

2014-09-15 Thread Steffen Grunewald

On Mon, Sep 15, 2014 at 10:29:04AM +1200, Andrew Ruthven wrote:
> Hey,
> 
> Is it possible to make an install abort from within a hook? Looking at
> the code it doesn't seem obvious.

What would "abort" mean? Leave the PXE boot file in place, and just
sit there and wait? Shut down? 

A couple of years I had to "reboot" inside a hook when it detected
the wrong RAID configuration (after fixing that, of course) - to get
the hw detection read the right disk names for the install. I can't
see a reason why a shutdown wouldn't work as well - you'd lose the
install log though. Of course you may make the hook run into a loop 
of beeps suggesting to ssh into the machine, and check the last lines 
of fai.log ... depends on your environment (beeps are hardly audible
if there are 100s of other machine humming next to this one).

> I've added a hook for extrbase to check and make sure if a require
> basefile is present (namely Ubuntu ones if we're building an Ubuntu
> box), and if not present it'll spit an error. Ideally I'd just abort the
> install.

Doesn't extrbase do such a check itself, and throw an error code
which can be seen on faimond's output?

- S

Re: No atapi cdrom found

2014-12-22 Thread Steffen Grunewald

On Mon, Dec 22, 2014 at 03:30:23PM +0100, Thierry Ranson wrote:
> The thing is I've already installed the same kernel with the same cdrom
> (ide cdrom I just add when I need to since I don't need it after install) .
> The mother board was different though...
  ^

So it's not the CDROM driver, but the one for the MB chipset that's
probably not in (or available for) the old kernel... And Thomas was
right, in a sense.

> > Maybe the kernel driver is too old for supporting the new hardware of
> > the cdrom.

Cheers,
 Steffen

Re: install FAI on a special machine

2015-07-07 Thread Steffen Grunewald

On Tue, Jul 07, 2015 at 10:45:25AM +0200, Thierry Granier wrote:
> Hello
> i need your help one more time please!
> 
> i have a machine (master) running Debian 8 with a lot of new
> packages installed and a lot of updates on all the packages.
> this master has 2 disks :
> one for / and swap (all systemes packages) (disk1)
> another one for /home /opt /var /usr etc... (disk2)
> 
> I have a machine (client) running kali-linux with only 1disk and 100G free.
> 
> i'd like to "clone" the disk1 on the 100 free Gigas on the client
> 
> How can i do that with FAI?

I'm missing the point - why would you want to do this with FAI at all?
Wouldn't rsync (with the -x option) be sufficient?

- S

Re: installing to a wiped disk

2015-07-10 Thread Steffen Grunewald

On Fri, Jul 10, 2015 at 08:47:49AM -0500, John G Heim wrote:
> I have some machines that will PXE boot if the hard disk is not
> bootable. Usually, I make that so by writing zeros to the first
> million blocks. "dd if=/dev/zero of=/dev/sda count=100".
> 
> [Note, I know that a million blocks is way more than necessary.]
> 
> -- But that appears to cause a problem for setup-storage. It errors
> out if there is no partition table on the disk. Is there a way to
> get setup-storage to create a partition table?

That must be a problem with your disk_config - I'm sometimes using
the same trick... Can you show some detailts to the list?

Cheers,
 S

Re: can't soft reboot after install

2015-07-10 Thread Steffen Grunewald

On Fri, Jul 10, 2015 at 08:44:20AM -0500, John G Heim wrote:
> At the end of an install, my install client has a message that says
> press enter to reboot. But it doesn't work.In fact, issuing a
> shutdown command doesn't work. I've tried "shutdown -r now",
> "reboot". I've tried logging in via ssh and issuing those same
> commands. I have to power cycle the machine to get it to boot into
> the new install. Any ideas?

Use "faireboot" from the command line (you have "sshd" in your 
FAI_FLAGS, I presume?).
Did you add "reboot" to FAI_FLAGS?

- S

Re: can't soft reboot after install

2015-07-10 Thread Steffen Grunewald

On Fri, Jul 10, 2015 at 03:48:33PM +0200, Thomas Lange wrote:
> > On Fri, 10 Jul 2015 08:44:20 -0500, John G Heim  
> > said:
> 
> > At the end of an install, my install client has a message that says 
> > press enter to reboot. But it doesn't work.In fact, issuing a shutdown 
> > command doesn't work. I've tried "shutdown -r now", "reboot". I've 
> tried 
> > logging in via ssh and issuing those same commands. I have to power 
> > cycle the machine to get it to boot into the new install. Any ideas?
> If you log in from remote, try fai-reboot.
> But pressing return should work.

I used to ssh into the node during installation, and run "tee -f 
/tmp/fai/fai.log"
to wait for FAI to finish - which at the end shows the request to press Enter.

It didn't reboot.

It took me a couple of minutes to find out why... and this happens over and over
again.

- S

Re: installing to a wiped disk

2015-07-10 Thread Steffen Grunewald

On Fri, Jul 10, 2015 at 10:09:52AM -0500, John G Heim wrote:
> 
> >
> >I am using "disklabel:gpt" to create a GPT style partition table. You
> >can also use "disklabel:msdos" to make the old style partition table.
> >This belongs on your "disk_config" line before you specify partitions,
> >file systems, etc.
> 
> Oh, when I heard that, I thought it was a way to label a partition
> -- which makes no sense since it is part of the disk configuration.
> I should have made the connection because I've been going into
> parted and entering a mklabel command to create a partition table.

It's all about the first sector of a direct-access medium.

Some older operating systems (including SunOS) used the term
"disk label" synonymously for "partition table".
There's also the term "master boot record" (MBR) but that already reflects
its major function in a DOS/Windows setup (Suns used firmware for booting,
nice Forth stuff :)).
BTW, the MBR only consists of the first ~60% of the whole sector,
so calling the whole one a "partition table" is also wrong B-)

>  You say label, I say table. Lets call the whole thing off.

In fact, it was (and perhaps still is) to forge a "sector 0" that can
act both as a disklabel and a DOS partition table/MBR.

- S

Re: installing to a wiped disk

2015-07-10 Thread Steffen Grunewald

On Fri, Jul 10, 2015 at 05:17:33PM +0200, Steffen Grunewald wrote:
> On Fri, Jul 10, 2015 at 10:09:52AM -0500, John G Heim wrote:
> > 
> > >
> > >I am using "disklabel:gpt" to create a GPT style partition table. You
> > >can also use "disklabel:msdos" to make the old style partition table.
> > >This belongs on your "disk_config" line before you specify partitions,
> > >file systems, etc.
> > 
> > Oh, when I heard that, I thought it was a way to label a partition
> > -- which makes no sense since it is part of the disk configuration.
> > I should have made the connection because I've been going into
> > parted and entering a mklabel command to create a partition table.
> 
> It's all about the first sector of a direct-access medium.
> 
> Some older operating systems (including SunOS) used the term
> "disk label" synonymously for "partition table".
> There's also the term "master boot record" (MBR) but that already reflects
> its major function in a DOS/Windows setup (Suns used firmware for booting,
> nice Forth stuff :)).
> BTW, the MBR only consists of the first ~60% of the whole sector,
> so calling the whole one a "partition table" is also wrong B-)
> 
> >  You say label, I say table. Lets call the whole thing off.
> 
> In fact, it was (and perhaps still is) to forge a "sector 0" that can
> act both as a disklabel and a DOS partition table/MBR.

It was *possible*. No idea where that word jumped off for an early weekend.

- S

Multiple NFSROOTs (Jessie, Stretch) with FAI 5

2016-07-22 Thread Steffen Grunewald

Hello,

after a long time I'm back to setting up FAI, this time for both Jessie
and Stretch clients (and an old Wheezy setup will have to be merged so
no multiple DHCP servers are around).

I'm following the version 5.0 instructions from the FAI Guide, section
"Setup your faiserver" and found a couple of minor quirks:

- the suggested "sed" commands need a "-i" option to write back to the same file
- while there is a -C option to fai-setup, to select multiple copies of the 
  (/etc/fai) config tree, there's none for the log files (which get overwritten
  as a result)
- uninstalling fai-* may break if by accident /etc/fai was deleted, and even
  reinstalling (apt-get install --reinstall) fai-client and fai-server may fail
  to fix this - I had to edit the pre/postrm scripts kept in /var/lib/dpkg/info
  to overcome this
- the "fai" user at least sometimes isn't removed on uninstall (?)

Also, for Stretch, "libicu52" has to be replaced with "libicu55" in NFSROOT.
(I'm afraid, with all the transitions still going on, more will be added.)

Are there plans for an official Stretch version?

Cheers,

 Steffen

Not reboot/halt after successful installation

2016-07-25 Thread Steffen Grunewald

Hello,

according to the HTML user guide, there seems to be no option to "not reboot" a 
node that
successfully finished a sysinfo or install run - except to add something to the 
error log
forcefully (e.g. in a "last" hook). 
Citing the guide, "reboot: ... If this flag is not set, and error.log contains 
anything,
the install client will stop and wait that you press RETURN. If no errors 
occurred, the
client will always reboot automatically." (similar for halt).

I have found an old patch I used to apply to FAI 3.x (with boot=live) that 
deactivated
the check of the error log, and stopped if neither "reboot" nor "halt" were 
requested:

--- ./live/filesystem.dir/usr/lib/fai/subroutines.ORIG  2012-06-26 
14:31:51.0 +
 
+++ ./live/filesystem.dir/usr/lib/fai/subroutines   2012-08-15 
06:32:48.384327045 +
 
@@ -510,12 +510,10 @@   


 : ${flag_halt:=0}  





 # reboot/halt without prompting if FAI_FLAG reboot or halt is set and 
errors are found
 
-# wait for keypress if error is found and neither flag reboot nor halt is 
set 
 
-if [ -s $LOGDIR/error.log -a "$flag_reboot" -eq 0 -a "$flag_halt" -eq 0 ]; 
then

+# wait for keypress if error is neither flag reboot nor halt is set - 
ignore errors   
 
+if [ "$flag_reboot" -eq 0 -a "$flag_halt" -eq 0 ]; then


 echo "Press  to reboot."   


 read   


-else   


-sleep 10   


 fi 





 sendmon "TASKEND faiend 0" 



/usr/lib/fai/subroutines in a jessie nfsroot seems not to check the size of 
error.log
(nor can I find a mention of error.log in any other place). Is the (5.0) user 
guide wrong,
or do I have to look in the right places for a change?

Cheers,
 Steffen

-- 
Steffen Grunewald, Cluster Administrator
Max Planck Institute for Gravitational Physics (Albert Einstein Institute)
Am Mühlenberg 1
D-14476 Potsdam-Golm
Germany
~~~
Fon: +49-331-567 7274
Fax: +49-331-567 7298
Mail: steffen.grunewald(at)aei.mpg.de
~~~

Re: FAI 5.1.2 grub problem

2016-08-01 Thread Steffen Grunewald

On Mon, 2016-08-01 at 13:16:58 +0300, "Hannu T. Pysäys" wrote:
> Hi, 
> 
> 
> I have a grub problem (Debian 8.5 / FAI 5.1.2) , GRUB_PC always fails with 
> following code:
> 
> GRUB_PC/10-setup FAILED with exit code 1.
> Can you provide me pointers where to start look this issue? 

What about adding a "-x" option to the shebang line? That should provide you
with a better idea of which step went wrong in that script.
Are you - by any chance - running a hardware RAID?

Cheers,
- Steffen

Canonical way to upgrade an existing nfsroot?

2016-09-06 Thread Steffen Grunewald

Apparently I have killed a working nfsroot by naively running "apt-get upgrade".
What's the canonical way to keep a nfsroot updated? Would it be sufficient
to "hold" the dracut* packages so their postinst scripts don't fail?

Thanks,
 S

-- 
Steffen Grunewald, Cluster Administrator
Max Planck Institute for Gravitational Physics (Albert Einstein Institute)
Am Mühlenberg 1
D-14476 Potsdam-Golm
Germany
~~~
Fon: +49-331-567 7274
Fax: +49-331-567 7298
Mail: steffen.grunewald(at)aei.mpg.de
~~~

Re: Canonical way to upgrade an existing nfsroot?

2016-09-06 Thread Steffen Grunewald

On Tue, 2016-09-06 at 15:51:06 +0200, Steffen Grunewald wrote:
> Apparently I have killed a working nfsroot by naively running "apt-get 
> upgrade".
> What's the canonical way to keep a nfsroot updated? Would it be sufficient
> to "hold" the dracut* packages so their postinst scripts don't fail?

To avoid all sorts of hassle, I decided to rebuild the nfsroots (jessie, 
stretch)
from scratch. I'm running into Debian bug #830229 now (jessie/koeln, 044+109-1).
This did happen before as well (044+105-2) but I didn't notice, and stretch
seems to be unaffected (so this may actually be a policy/dpkg issue?).

Back to reinstalling my manual changes...

- S

Re: Canonical way to upgrade an existing nfsroot?

2016-09-06 Thread Steffen Grunewald

On Tue, 2016-09-06 at 16:08:57 +0200, Thomas Lange wrote:
> >>>>> On Tue, 6 Sep 2016 16:05:04 +0200, Steffen Grunewald 
> >>>>>  said:
> 
> > from scratch. I'm running into Debian bug #830229 now (jessie/koeln, 
> 044+109-1).
> > This did happen before as well (044+105-2) but I didn't notice, and 
> stretch
> > seems to be unaffected (so this may actually be a policy/dpkg issue?).
> I also think that this bug is more a dpkg bug.

Well, you don't "backport" a sid package without asking for trouble sometimes.
The "rm_conffiles" line in debian/dracut.maintscript seems to require a newer
dpkg, indeed...
Checking the differences between dpkg-maintscript-helper scripts in dpkg 1.17.27
(jessie) and 1.18.10 (stretch), one quickly finds a bunch of

-  dpkg --compare-versions "$2" le-nl "$LASTVERSION"; then
+  dpkg --compare-versions -- "$2" le-nl "$LASTVERSION"; then

replacements - which seem to be closely related.
dracut's maintscripts seem to make use of that.

I found that you have created a backport for jessie but don't distribute that 
via
jessie/koeln?
You might tweak the dracut.maintscript (if it's required at all) ...

- S

Re: Canonical way to upgrade an existing nfsroot?

2016-09-07 Thread Steffen Grunewald

On Tue, 2016-09-06 at 17:17:44 +0200, Thomas Lange wrote:
> >>>>> On Tue, 6 Sep 2016 17:13:34 +0200, Steffen Grunewald 
> >>>>>  said:
> 
> > I found that you have created a backport for jessie but don't 
> distribute that via
> > jessie/koeln?
> > You might tweak the dracut.maintscript (if it's required at all) ...
> Because jessie/koeln already includes the newest FAI and dracut
> versions which are identical to the versions in backports.

Identical in what sense?


$ lynx -dump -head 
http://fai-project.org/download/jessie/dracut_044+109-1_all.deb
HTTP/1.1 200 OK
Date: Wed, 07 Sep 2016 08:02:29 GMT
Server: Apache/2.2.22 (Debian)
Last-Modified: Sun, 31 Jul 2016 19:37:50 GMT
ETag: "c210c0-1e5a-538f39e738e74"
Accept-Ranges: bytes
Content-Length: 7770
Connection: close
Content-Type: application/x-debian-package


$ lynx -dump -head 
http://ftp.de.debian.org/debian/pool/main/d/dracut/dracut_044+109-1~bpo8+1_all.deb
HTTP/1.1 200 OK
Date: Wed, 07 Sep 2016 08:03:02 GMT
Server: Apache/2.4.10 (Debian)
Last-Modified: Thu, 25 Aug 2016 21:07:45 GMT
ETag: "1e90-53aebca144911"
Accept-Ranges: bytes
Content-Length: 7824
Connection: close
Content-Type: application/x-debian-package


$ lynx -dump -head 
http://ftp.de.debian.org/debian/pool/main/d/dracut/dracut_044+109-1_all.deb
HTTP/1.1 200 OK
Date: Wed, 07 Sep 2016 08:03:11 GMT
Server: Apache/2.4.10 (Debian)
Last-Modified: Sun, 31 Jul 2016 20:11:38 GMT
ETag: "1e5a-538f41755ef9a"
Accept-Ranges: bytes
Content-Length: 7770
Connection: close
Content-Type: application/x-debian-package




BTW, just found out that there's a "stretch/koeln" as well. Trying to rebuild 
the nfsroot
(had to replace "libpsl0-" with "libpsl5-" in /etc/fai-stretch/NFSROOT, due to 
a recent transition;
in addition to removing "libicu52-") - seems to succeed.

Also, one may want to add [trusted=yes] to the fai-project.org apt line... (is 
this already in 
the manual?)

- S

NIC1 not set in /usr/lib/fai/subroutines

2016-09-07 Thread Steffen Grunewald

I'm getting a warning from FAI 5.1.2 for jessie, when running "sysinfo":

/usr/lib/fai/subroutines: line 813: /sys/class/net//address: No such file or 
directory

It turns out that $NIC1 doesn't get set.
Since everything else still works, this isn't alarming, but annoying.


- S

Re: NIC1 not set in /usr/lib/fai/subroutines

2016-09-07 Thread Steffen Grunewald

On Wed, 2016-09-07 at 11:15:29 +0200, Thomas Lange wrote:
> >>>>> On Wed, 7 Sep 2016 10:22:28 +0200, Steffen Grunewald 
> >>>>>  said:
> 
> > I'm getting a warning from FAI 5.1.2 for jessie, when running "sysinfo":
> > /usr/lib/fai/subroutines: line 813: /sys/class/net//address: No such 
> file or directory
> 
> > It turns out that $NIC1 doesn't get set.
> > Since everything else still works, this isn't alarming, but annoying.
> Normally NIC1 should be set. Are you booting from FAI CD?

No. "normal" PXE, with only one i/f connected.

I tracked this down a bit. "ip route" returns a single line

10.150.0.0/16 dev eth0  proto kernel  scope link  src 10.150.90.22

- there's no "default" in it.
Because this is a private subnet, the DHCP server doesn't tell the client 
anything
about gateways, default routes, etc. - this seems to be the root cause.
Hosts get installed with a static network config afterwards, which means that
even if there's a gw, and a default route, it will be added only later.

Would it make sense to pick the i/f from the ip route output if there's only
a single line, to cover such a scenario?

> Anyway, I've added a test, that checks $NIC1 before using it.

Cannot harm...

I'm currently running a stretch-based sysinfo on the same machine, same result
(of course) but what's surprising and unexpected, after the recent complaints: 
the devices in /sys/class/net are still named "eth*". No "enp${i}s${j}". 
This is kernel 4.6.4-1... Is there a trick?

Cheers,
 S

Fai 5.1.2 jessie -> stretch regression (netcat missing)

2016-09-09 Thread Steffen Grunewald

Hi,

for the fun of it, and to see how my old hardware might react under a 4.x 
kernel,
I built a stretch nfsroot and started a sysinfo run on my test machine.
I found that 

Can't connect to monserver on faiserver port 4711. Monitoring disabled. 

   
This used to work with the jessie setup.

Changes made:
all over: all "jessie" becomes "stretch", including "jessie koeln"
NFSROOT: "libicu52-" removed, "libpsl0-" becomes "libpsl5-"


Half an hour later, it turned out that for whatever reason, "netcat" wasn't
included anymore, causing the "nc" command in sendmon() to fail.
Adding it to NFSROOT, rebuilding the nfsroot, restarting:
- works now!

Apparently an internal dependency of "some" package in the NFSROOT list on
netcat was removed on the way from Jessie to Stretch. IMO it would not cause any
trouble if netcat was added to NFSROOT - it just belongs there, next to rdate
(who's still using that?) and ntpdate.

Please add this to your list for 5.1.3 ;)

BTW, there's a couple of dracut* packages still in 
http://fai-project.org/download/stretch/
which have been obsoleted by the central repository:
Get:25 http://ftp.de.debian.org/debian stretch/main amd64 dracut-core amd64 
044+109-1 [232 kB]
Get:26 http://ftp.de.debian.org/debian stretch/main amd64 dracut-config-generic 
all 044+109-1 [6002 B]
Get:28 http://ftp.de.debian.org/debian stretch/main amd64 dracut-network all 
044+109-1 [48.3 kB]
Get:62 http://ftp.de.debian.org/debian stretch/main amd64 dracut all 044+109-1 
[7770 B]
- actually no packages are fetched from fai-project.org anymore (for now, that 
is)

Cheers,
 S

-- 
Steffen Grunewald, Cluster Administrator
Max Planck Institute for Gravitational Physics (Albert Einstein Institute)
Am Mühlenberg 1
D-14476 Potsdam-Golm
Germany
~~~
Fon: +49-331-567 7274
Fax: +49-331-567 7298
Mail: steffen.grunewald(at)aei.mpg.de
~~~

add debconf lines to existing setup?

2016-12-01 Thread Steffen Grunewald

Good afternoon,

during FAI installation, $faiconfig/debconf/$class files are added to the
debconf database. This has been working all the time.

Now, I've got a package upgrade which requires setting some (new) values - 
how do I add these settings to an existing, running setup?
/usr/share/debconf/confmodule apparently doesn't work (and db_set therefore
cannot be used).
Do I have to install debconf-utils and use debconf-set-selections, or is 
there a faster way?

Thanks,
 Steffen

-- 
Steffen Grunewald, Cluster Administrator
Max Planck Institute for Gravitational Physics (Albert Einstein Institute)
Am Mühlenberg 1
D-14476 Potsdam-Golm
Germany
~~~
Fon: +49-331-567 7274
Fax: +49-331-567 7298
Mail: steffen.grunewald(at)aei.mpg.de
~~~

Re: How to select specific version for Debian distribution and packages.

2017-07-13 Thread Steffen Grunewald

On Thu, 2017-07-13 at 15:26:50 +0200, Giorgio Buffa wrote:
> Hello list.
> I would like to install Debian Jessie on my PC. Is it possible to configure
> FAI in order to install the same version of kernel and packages as the ones
> provided in Debian 8.0.0 DVDs?
> 
> In general: is it possible to configure FAI in order to select specific
> version of Debian distribution (e.g v8.0.0) and packages to be installed?

Hi Giorgio,

not sure why you want to do this, but ... you should be able to select
a certain snapshot (that corresponds to the release of 8.0 or whichever
point release - might be tricky to find out the exact date) instead of
the main Debian repository.

HTH,
 Steffen

Re: R: How to select specific version for Debian distribution and packages.

2017-07-13 Thread Steffen Grunewald

On Thu, 2017-07-13 at 15:40:35 +0200, Giorgio Buffa wrote:
> Hi Steffen.
> I need to be able to automatically re-create the exact system (OS and
> configuration) provided to my customer. The documentation says the system
> must be Debian v8.0.0. That's the reason behind my request.

Your customer didn't apply any updates, including security ones? 8-(

Anyway, according to 
https://en.wikipedia.org/wiki/Debian_version_history#Debian_8_.28Jessie.29
you are looking for a 20150426 datestamp.

http://snapshot.debian.org/archive/debian/20150426T134616Z/ seems to be close.

For the exact contents of the DVD you might want to ask the Debian Release Team.
BTW, starting with Wheezy (7), there are only two numbers in a version, so in
a strict sense there's no 8.0.0.

- S

/tmp read-only? FAI 5.5, Stretch, NFSv3

2018-01-17 Thread Steffen Grunewald

Hello,

after running several sysinfo FAI_ACTIONs with jessie setups (and "aufs"
in the append line), I decided the time has come to switch to Stretch.

I upgraded fai-* to 5.5 from the uni-koeln Stretch repository, copied
/etc/fai to /etc/fai-stretch, added a few packages to NFSROOT, and then
built the NFS-root following the docs (BTW, page 29 has a small "c"
instead of capital "C" in the fai-make-nfsroot example line).

A few "error" lines showed up in the log:
root@t-pring:/etc/fai-stretch# grep -iC3 error: nfsroot.log 
Setting up dracut (045+132-1) ...
dracut: Generating /boot/initrd.img-4.9.0-5-amd64
/usr/lib/dracut/modules.d/45url-lib/module-setup.sh: line 33: warning: command 
substitution: ignored null byte in input
dracut-install: ERROR: installing '/etc/ssl/certs/ca-certificates.crt'
dracut: FAILED: /usr/lib/dracut/dracut-install -D 
/var/tmp/dracut.Mk2W0u/initramfs /etc/ssl/certs/ca-certificates.crt
/usr/lib/dracut/modules.d/45url-lib/module-setup.sh: line 33: warning: command 
substitution: ignored null byte in input
dracut-install: ERROR: installing '/etc/ssl/certs/ca-certificates.crt'
dracut: FAILED: /usr/lib/dracut/dracut-install -D 
/var/tmp/dracut.Mk2W0u/initramfs /etc/ssl/certs/ca-certificates.crt
/usr/lib/dracut/modules.d/45url-lib/module-setup.sh: line 33: warning: command 
substitution: ignored null byte in input
dracut-install: ERROR: installing '/etc/ssl/certs/ca-certificates.crt'
dracut: FAILED: /usr/lib/dracut/dracut-install -D 
/var/tmp/dracut.Mk2W0u/initramfs /etc/ssl/certs/ca-certificates.crt
Setting up openssl (1.1.0f-3+deb9u1) ...
Setting up threeware-control (10.2-1) ...
--
Setting up grub-pc (2.02~beta3-5) ...

Creating config file /etc/default/grub with new version
grub-probe: error: cannot find a device for / (is /dev mounted?).
grub-probe: error: cannot find a device for /boot (is /dev mounted?).
grub-probe: error: cannot find a device for /boot/grub (is /dev mounted?).
Setting up libisccfg140:amd64 (1:9.10.3.dfsg.P4-12.3+deb9u4) ...
Setting up emacs25-nox (25.1+1-4+deb9u1) ...
update-alternatives: using /usr/bin/emacs25-nox to provide /usr/bin/emacs 
(emacs) in auto mode

- they seem to be sufficiently benign though.

With a pxelinux.cfg file:

root@t-pring:/srv/fai/tftp/pxelinux.cfg# cat 0A966401 
# generated by fai-chboot for host mds-eth0 with IP 10.150.100.1
default fai-generated

label fai-generated
kernel vulcan-stretch/vmlinuz
append initrd=vulcan-stretch/initrd.img ip=dhcp  
root=10.150.100.198:/srv/fai/nfsroots/vulcan-stretch:vers=3   
FAI_FLAGS=verbose,sshd,createvt 
FAI_CONFIG_SRC=nfs://10.150.100.198/srv/fai/config/vulcan-stretch 
FAI_ACTION=sysinfo 

... I get the infamous /tmp read-only error *although* all mounts are NFSv3:

root@t-pring:~# ssh -oUserKnownHostsFile=/dev/null root@mds-eth0
The authenticity of host 'mds-eth0 (10.150.100.1)' can't be established.
ECDSA key fingerprint is SHA256:dRQY3FUCjC5bCTiiXYEJxNdDVE9v2/ihKy3zc4JSkpk.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'mds-eth0,10.150.100.1' (ECDSA) to the list of known 
hosts.
root@mds-eth0's password:
Linux mds-eth0 4.9.0-5-amd64 #1 SMP Debian 4.9.65-3+deb9u2 (2018-01-04) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
root@mds-eth0:~# df
Filesystem  1K-blocks  Used Available 
Use% Mounted on
devtmpfs 65946268 0  65946268   
0% /dev
tmpfs65963204  1612  65961592   
1% /run
10.150.100.198:/srv/fai/nfsroots/vulcan-stretch 622391552 497678848 124712704  
80% /
10.150.100.198:/srv/fai/config/vulcan-stretch   622391552 497678848 124712704  
80% /var/lib/fai/config
root@mds-eth0:~# mount
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs 
(rw,nosuid,noexec,size=65946268k,nr_inodes=16486567,mode=755)
devpts on /dev/pts type devpts 
(rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,noexec,mode=755)
10.150.100.198:/srv/fai/nfsroots/vulcan-stretch on / type nfs 
(ro,relatime,vers=3,rsize=262144,wsize=262144,namlen=255,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.150.100.198,mountvers=3,mountport=47820,mountproto=udp,local_lock=all,addr=10.150.100.198)
10.150.100.198:/srv/fai/config/vulcan-stretch on /var/lib/fai/config type nfs 
(ro,noatime,vers=3,rsize=262144,wsize=262144,namlen=255,acregmin=1800,acregmax=1800,acdirmin=1800,acdirmax=1800,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.150.100.198,mountvers=3,mountport=47820,mountproto=udp,local_lock=all,addr=10.150.100.198)

(Of course I followed the suggestions,

Re: /tmp read-only? FAI 5.5, Stretch, NFSv3

2018-01-17 Thread Steffen Grunewald

On Wed, 2018-01-17 at 14:21:42 +0100, Thomas Lange wrote:
> >>>>> On Wed, 17 Jan 2018 14:09:03 +0100, Steffen Grunewald 
> >>>>>  said:
> 
> > after running several sysinfo FAI_ACTIONs with jessie setups (and "aufs"
> > in the append line), I decided the time has come to switch to Stretch.
> In stretch you need rootovl instead of aufs in the kernel cmdline.
> We are not using aufs any more but 4.x kernel is using overlayfs.

In fact I had tried "overlayfs" but not "rootovl".

> If you are using fai-chboot, this should be added to the pxelinux cfg
> file.

I forgot to use the -C option, thus /etc/fai (unconfigured) caused 
fai-chboot to use the wrong distro, and therefore add the wrong $bopt.
At the moment, I'm trying to support wheezy, jessie, and stretch - 
this is bound to cause confusion :(

> > (BTW, page 29 has a small "c"
> > instead of capital "C" in the fai-make-nfsroot example line).
> Thanks, this is now fixed.

Thank you for the quick response. "rootovl" will be written, in large,
friendly letters, on the title page of the fai manual...

- S

Use of /dev/disk/by-path/... in disk_config?

2018-01-18 Thread Steffen Grunewald

Hello,

I'm still planning my installation, and found that a future storage server
may move its /dev/sd* devices around if another JBOD is connected.
I want to access the internal disks, of course, and set them up as softRAID.

1. Is it possible to map "/dev/disk/by-path/pci-:00:11.4-ata-1" to
   "disk1" (etc.) for easier reference later?

2. Would the following work (with that mapping merged):

###disk1 = /dev/disk/by-path/pci-:00:11.4-ata-1
disk_config disk1 disklabel:msdos align-at:4k
primary -   512 - -
primary -   32G - -
# set aside for now
primary -   -100% - -

###disk2 = /dev/disk/by-path/pci-:00:11.4-ata-2
disk_config disk2 disklabel:msdos align-at:4k
primary -   512 - -
primary -   32G - -
# set aside for now
primary -   -100% - -

disk_config raid fstabkey:uuid bootable:1
raid1   /boot   disk1.1,disk2.1 ext4rw,noatime,nodiratime,errors=remount-ro 
createopts="-m0"
raid1   /   disk1.2,disk2.2 ext4rw,noatime,nodiratime,errors=remount-ro 
createopts="-m1"

  or do I have to take additional measures to make sure that the md devices
  "find" their components?

3. Has anyone setup ZFS using FAI?

Thanks,
 Steffen

Re: Use of /dev/disk/by-path/... in disk_config?

2018-01-18 Thread Steffen Grunewald

On Thu, 2018-01-18 at 13:12:24 +0100, Thomas Lange wrote:
> >>>>> On Thu, 18 Jan 2018 12:40:16 +0100, Steffen Grunewald 
> >>>>>  said:
> 
> > 1. Is it possible to map "/dev/disk/by-path/pci-:00:11.4-ata-1" to
> >"disk1" (etc.) for easier reference later?
> Yes. I use a script class/99-disklist.sh for changing the list of disks to
> my local meeds. The script is attached. It uses the disks model numbers
> to get a new order of the disks. The same should work with the serial
> number or path.

Thanks. Since the "find" command returns an unsorted list, how do you make sure
you get the right order?
(The way the disklist gets overwritten looks a bit, um, non-standard, but
efficient...)
Am I correct that "disk1" maps to the first item in the disklist, and so on?

> > 2. Would the following work (with that mapping merged):
> 
> >   or do I have to take additional measures to make sure that the md 
> devices
> >   "find" their components?
> I do not understand what you mean by "find". Here's an raid1 example
> from me. Also have a look at the nice feature sameas:disk1.

Between setting up the machine with FAI, and the reboot after maintenance,
the number of external JBOD disks may have changed. (Cable failure?)
Will md devices use the modified device names?

"sameas" is useful, and I've seen that before, but I have diverging plans
for the third partition.

> disk_config disk1align-at:1M
> primary  -   30G-100G   - -
> logical  swap1G-10Gswap sw
> logical  -   1G-500G   - -
> logical  -   1G-   - -

BTW, can I mix absolute and relative sizes (as in "500G-100%")?
I want setup-storage to fail if for an unknown reason it picked up
the wrong disks...

> disk_config disk2sameas:disk1
> 
> disk_config raid   fstabkey:uuid preserve_reinstall:1,2
> raid1  /   disk1.1,disk2.1ext4  rw,noatime,errors=remount-ro  
> createopts="-m15"
> raid1  /home   disk1.6,disk2.6ext4  rw,noatime,nosuid
> createopts="-m1" tuneopts="-c0 -i0"
> raid1  /srvdisk1.7,disk2.7ext4  rw,noatime,nosuid
> createopts="-m1" tuneopts="-c0 -i0"

Oh, I never learned about "tuneopts" before... very handy!

> > 3. Has anyone setup ZFS using FAI?
> Kerim, who wrote the btrfs code for setup-storage was thinking about
> adding ZFS support in the past. Maybe the interested people can do
> some funding to get this implemented.

Ah, OK... for now, partition and mountdisks hooks may do the trick...

Thanks, Steffen

Re: Use of /dev/disk/by-path/... in disk_config?

2018-01-18 Thread Steffen Grunewald

On Thu, 2018-01-18 at 14:19:24 +0100, Thomas Lange wrote:
> >>>>> On Thu, 18 Jan 2018 14:09:49 +0100, Steffen Grunewald 
> >>>>>  said:
> 
> > Thanks. Since the "find" command returns an unsorted list, how do you 
> make sure
> > you get the right order?
> I do not care, since I only match on the model type of the disk, not
> exactly which disk. You could use the serial number, or use the
> by-path directory.

I chose the latter, which is easier to obtain.
At least for built-in disks, the list is stable then, and of external
ones, ZFS keeps proper track. (Still remember that video showing how
robust ZFS is, with about a dozen USB thumb drives being pulled out and
replugged in random order...)

Talking about "path" - is there a trick to get multipath-tools installed
in the nfsroot? All my attempts fail due to initramfs not cooperating 
with dracut.

> > Am I correct that "disk1" maps to the first item in the disklist, and 
> so on?
> Yes.

Meanwhile I found that hidden in the end of the setup-storage man page.
The FAI guide itself contains no reference to disk$number.

> > Between setting up the machine with FAI, and the reboot after 
> maintenance,
> > the number of external JBOD disks may have changed. (Cable failure?)
> > Will md devices use the modified device names?
> I do not know. But I'm pretty sure that md uses some uuid on the
> partitions of the disks to assembly the md device.

I'll find out, the hard way. (This should be common, and have been
thought of by developers before, so I'm not afraid.)

> > BTW, can I mix absolute and relative sizes (as in "500G-100%")?
> Yes. IIRC internally we convert everything to blocks.

Nice. If I get the disks mixed up, this can act as an emergency brake.
Would be nice if one could specify lower and upper capacity limits in
a disk_config template...

> > Ah, OK... for now, partition and mountdisks hooks may do the trick...
> Yes, hooks always do the trick ;-)

Hooks, to hang yourself...

> I guess it would be fine if you could publish your zfs hooks here on
> the list. Others may be interested in them.

I'm still in the initial phases (planning, you know). Basically, I'll
get a list of JBOD disks (from /dev/disk/by-path, selecting "sas-ext"
instead of "ata"), split that into sets of n+2+1, pipe that through
"while read" to further split off one spare, and run a "zpool create".
For a very long time (10+ years???) I haven't looked into ZFS anymore,
time to dig out the old Sun admin guides, in addition to Aaron Toponce's
article series from 2012/13... ZIL and ARC will go to the extra SSD 
partitions, as you may have guessed.

Still a long way to go.

Thanks for your constant support.
 S

Re: Use of /dev/disk/by-path/... in disk_config?

2018-01-19 Thread Steffen Grunewald

On Thu, 2018-01-18 at 18:21:03 +0100, Robert Markula wrote:
> Am 18.01.2018 um 14:19 schrieb Thomas Lange:
> > I guess it would be fine if you could publish your zfs hooks here on
> > the list. Others may be interested in them.
> 
> Yes, that would be nice. My impression is that nextgen fault tolerant
> filesystems like btrfs and ZFS are gaining popularity, last but not
> least thanks to Canonical's adoption of ZFS in Ubuntu two years ago.

Bad news: I just learned that the systems will get CentOS installed.
While there's still an option to replace the system disks, and run 
another installation (Debian 9 via FAI) I'm afraid that will void
our warranty.

> Since btrfs still has some rough edges I'll be looking at ZFS for a
> research project scheduled later this year, so I'm definately interested
> in any progress in this topic.

No promises yet...

- S

Re: Use of /dev/disk/by-path/... in disk_config?

2018-01-21 Thread Steffen Grunewald

Hi,

On Thu, 2018-01-18 at 16:23:03 +0100, Thomas Lange wrote:
> >>>>> On Thu, 18 Jan 2018 16:08:32 +0100, Steffen Grunewald 
> >>>>>  said:
> 
> > Talking about "path" - is there a trick to get multipath-tools installed
> > in the nfsroot? All my attempts fail due to initramfs not cooperating 
> > with dracut.
> Just add the packages to /etc/fai/NFSROOT. Then they will be
> installed into the nfsroot when you recreated the nfsroot.

This didn't work, I tried multiple times (and learned my lesson not to
create a new nfsroot as long as there are machines mounting the old one).

With a nfsroot otherwise functional, I've chroot-ed inside, and
"apt-get -s install multipath-tools" offers the following:

Reading package lists...
Building dependency tree...
Reading state information...
The following additional packages will be installed:
  initramfs-tools initramfs-tools-core klibc-utils libaio1
  libboost-iostreams1.62.0 libboost-random1.62.0 libboost-system1.62.0
  libboost-thread1.62.0 libklibc libnspr4 libnss3 librados2 libsgutils2-2
  liburcu4 sg3-utils sg3-utils-udev
Suggested packages:
  bash-completion multipath-tools-boot
Recommended packages:
  busybox | busybox-static
The following packages will be REMOVED:
  dracut* zfs-dracut*
The following NEW packages will be installed:
  initramfs-tools initramfs-tools-core klibc-utils libaio1
  libboost-iostreams1.62.0 libboost-random1.62.0 libboost-system1.62.0
  libboost-thread1.62.0 libklibc libnspr4 libnss3 librados2 libsgutils2-2
  liburcu4 multipath-tools sg3-utils sg3-utils-udev
0 upgraded, 17 newly installed, 2 to remove and 3 not upgraded.
[...]

I guess it's not the best idea to remove dracut from a FAI nfsroot.

> Do you need the multipath tools for booting the machine, or just for
> accessing some disk during installation?

To sort out 60 JBOD disks in a HGST box (very similar to Thumpers if you
still remember those) connected via two SAS cables. Both show up twice as
/dev/sd* and /dev/sg*, only once in /dev/disk/by-id, and twice in .../by-path
(obviously).

> If needed for booting, we may have to customized the initrd inside the
> nfsroot (copied to the tftp dir), because dracut may not work
> out-of-the-box with multipath devices.
> But I'm also the Debian maintainer of dracut, if you need some help.

Since dracut* is kind of an alternative approach to initramfs* you might want
to get your packages listed as alternatives, to resolve the dependencies, if
that's feasible (is it really a drop-in replacement?)
AFAICT it's sg3-utils-udev that causes the issue to show up, by depending
on initramfs-* stuff. With an udev involved, this looks a bit trickier than
with ordinary packages.

> > Thanks for your constant support.
> It would be nice if you could update your FAI questionnaire. The last
> I got is from 2009. And I would like to add you to the references page
> with your logo. https://fai-project.org/references/

I'll try and do my best...

Cheers,
- S

FAI setup for BeeGFS with ZFS, Re: Use of /dev/disk/by-path/... in disk_config?

2018-01-24 Thread Steffen Grunewald

Hi,

On Fri, 2018-01-19 at 17:37:38 +0100, Thomas Lange wrote:
> >>>>> On Fri, 19 Jan 2018 16:34:33 +0100, Steffen Grunewald 
> >>>>>  said:
> 
> > Bad news: I just learned that the systems will get CentOS installed.
> > While there's still an option to replace the system disks, and run 
> > another installation (Debian 9 via FAI) I'm afraid that will void
> > our warranty.
> You get warranty when running CentOS on hardware? Normally the vendors
> wants you to run RHEL to get warranty.
> But you still can install CentOS using FAI ;-)

It turned out that Debian 9 was installed, something I'm more familiar with.
(We get support on the hardware if it got setup by one of the vendor's 
engineers - whether that would be void if I did my own config is left as
an exercise for the lawyers.)

Still, FAI doesn't seem to offer an option to handle unpartitioned disks to
be formatted as ext4 (I couldn't find any documentation involving "raw-disk"
except for md or lvm usage?), suggested for BeeGFS metadata storage.
Also, with multipath-tools not installable in the nfsroot (yet), the setup of
ZFS pools for the object storage had to be postponed until after the first
reboot (one may want to scan the JBOD and assign /dev/mapper/disk* aliases,
before creating zpools - those HGST boxes don't offer a lot of support for
the poor guy who's got to replace a failed disk, and /dev/sd* names seem to be
assigned in a rather random order).
So no, there's no consistent, FAI-only solution for setting up BeeGFS storage
on such hardware yet.
I'm afraid that the motivation to redo this setup is pretty small (never 
change...), thus there won't be anything useful from my side for the weeks
to come.

I apologize if that's bad news, but I didn't have time nor means to plan this
carefully to be fully done within FAI.

Cheers,
 Steffen

Re: FAI setup for BeeGFS with ZFS, Re: Use of /dev/disk/by-path/... in disk_config?

2018-01-25 Thread Steffen Grunewald

On Wed, 2018-01-24 at 10:05:09 +0100, Thomas Lange wrote:
> >>>>> On Wed, 24 Jan 2018 09:03:04 +0100, Steffen Grunewald 
> >>>>>  said:
> 
> > Still, FAI doesn't seem to offer an option to handle unpartitioned 
> disks to
> > be formatted as ext4 (I couldn't find any documentation involving 
> "raw-disk"
> Here's an example line:
> 
> raw-disk / 4GB ext4 rw createopts="-F"
> 
> -F helps mkfs not to stop and ask some questions.

Please add this to the documentation somewhere, all I could find was the "- 0 - 
-"
line (to be used by lvm or raid later), and did not expect this to be more 
capable.

For a BeeGFS metadata storage, add "-i 2048 -I 512 -J size=400 
-Odir_index,filetype".

For ZFS-based storage, a hook seems to be unavoidable - and better management of
large JBODs could make good use of multipathing. Mapping physical locations to
/dev/mapper/disk${index} would be a plus...
I won't pursue this any further now but will keep it on my list.

Thanks, 
 Steffen

Non-X86 architectures?

2018-04-19 Thread Steffen Grunewald

Good morning,

is there anyone using FAI to install on architectures that are not X86*?
(I had hacked some support for alpha more than a dozen years ago but those
machines are long gone - I'm looking into arm64 and powerpc now but am open
for almost anything beyond Intel/AMD.)

Thanks,
 S

-- 
Steffen Grunewald, Cluster Administrator
Max Planck Institute for Gravitational Physics (Albert Einstein Institute)
Am Mühlenberg 1
D-14476 Potsdam-Golm
Germany
~~~
Fon: +49-331-567 7274
Fax: +49-331-567 7298
Mail: steffen.grunewald(at)aei.mpg.de
~~~

Alternatives in package_lists?

2018-04-26 Thread Steffen Grunewald

Hello,

I couldn't find an answer to the question, and no matching example:

Is it possible to specify "a | b" in a package_config file, so that "a"
gets installed if available, and "b" otherwise?

Just listing "a b" will give bad results if both packages are available
- and in this specific case, there is no "conflict" between them - 
depending on the repositories involved.
Of course I could try this myself... :/

Thanks, S

-- 
Steffen Grunewald, Cluster Administrator
Max Planck Institute for Gravitational Physics (Albert Einstein Institute)
Am Mühlenberg 1
D-14476 Potsdam-Golm
Germany
~~~
Fon: +49-331-567 7274
Fax: +49-331-567 7298
Mail: steffen.grunewald(at)aei.mpg.de
~~~

Re: Alternatives in package_lists?

2018-04-27 Thread Steffen Grunewald

On Thu, 2018-04-26 at 21:20:44 +0200, Thomas Lange wrote:
> >>>>> On Thu, 26 Apr 2018 17:34:23 +0200, Steffen Grunewald 
> >>>>>  said:
> 
> > Is it possible to specify "a | b" in a package_config file, so that "a"
> > gets installed if available, and "b" otherwise?
> FAI only creates a long list of package names and then calls apt-get,
> aptitude or apt. If one of those tools provides a function like install
> a if available or b otherwise, then FAI cloud use it. FAI itself does
> not have this function yet. Patches are welcome ;-)

apt-get install "a | b" doesn't work.

> I wonder if any of the config management tools provides such a function?

Not directly, I'm afraid - but what seems to be feasible is to create a 
metapackage that has "a | b" as its only install dependency, and add that
to the package list.
Of course, the primary goal is to fix the underlying problem - which may
take longer than the time I have to come up with a working setup though.

Thanks,
 S

UEFI issues booting into FAI sysinfo on Dell PowerEdge

2018-09-03 Thread Steffen Grunewald

Hello,

I've been using FAI for years now, and could always avoid UEFI - until last
week when a coworker asked me to "quickly" run a sysinfo for one of his new
machines.
Of course, it comes with UEFI activated.
After finding how to distinguish between BIOS and UEFI PXE requests, and
setting up the DHCP/TFTP server accordingly, it was only "yet another step"
to find and add ldlinux.e64 to get the machine booting.
It requests the kernel, and loads it.
It requests the initrd, and hangs doing so.
/var/log/daemon shows both requests and nothing more after them.

There's a warning
 "core_udb_sendto: stalling on configure with no mapping"
further up the boot screen, but I'm not sure it'd be related.

The corresponding pxelinux.cfg file was created once, worked in legacy mode.

Unfortunately, UEFI-related FAI documentation is very sparse, and I may have
overlooked something?
Machine is a Dell one (if this makes a difference), PowerEdge R840.

Fortunately, the machines were able to boot in legacy mode, so the goal was
reached to a major extent (I couldn't test EFI detection though, not run
"efibootmgr -v" within one of the class scripts).


Any suggestions what to try next?

Thanks,
 Steffen

-- 
Steffen Grunewald, Cluster Administrator
Max Planck Institute for Gravitational Physics (Albert Einstein Institute)
Am Mühlenberg 1 * D-14476 Potsdam-Golm * Germany
~~~
Fon: +49-331-567 7274
Mail: steffen.grunewald(at)aei.mpg.de
~~~

Re: UEFI issues booting into FAI sysinfo on Dell PowerEdge

2018-09-03 Thread Steffen Grunewald

On Mon, 2018-09-03 at 14:44:54 +0200, Thomas Lange wrote:
> >>>>> On Mon, 3 Sep 2018 14:26:49 +0200, Steffen Grunewald 
> >>>>>  said:
> 
> > After finding how to distinguish between BIOS and UEFI PXE requests, and
> > setting up the DHCP/TFTP server accordingly, it was only "yet another 
> step"
> > to find and add ldlinux.e64 to get the machine booting.
> I use syslinux.efi in the dhcpd.conf for UEFI machines. I have

I'm following the instructions from SYSLINUX for automatic handling, with minor
modifications I stole from somewhere else:

if substring(option vendor-class-identifier, 0, 20) = 
"PXEClient:Arch:0" {

  filename "pxelinux.0";

# needs ldlinux.c32 

}   

if substring(option vendor-class-identifier, 0, 20) = 
"PXEClient:Arch:6" {

  filename "syslinux32.efi";

# needs ldlinux.e32 

}   

if substring(option vendor-class-identifier, 0, 20) = 
"PXEClient:Arch:7" {

  filename "syslinux64.efi";

# needs ldlinux.e64 

}   

if substring(option vendor-class-identifier, 0, 20) = 
"PXEClient:Arch:9" {

  filename "syslinux64.efi";

}   

Seems to work for 64-bit UEFI. (It's somewhat confusing that an x86_64 EFI file
would also work on an aarch64 machine, btw., but I have only one of those.)

> syslinux.efi and ldlinux.e64 in my tftp directory. Then I can use the

Indeed I had been missing ldlinux.e64 initially, but quickly learned about my
mistake from /var/log/daemon.log. syslinux64.efi has been copied from the 
syslinux
64-bit modules tree, and renamed accordingly.

> same syntax in the pxelinux.cfg files as before.

Thank you for confirming that no modification is necessary!

> > /var/log/daemon shows both requests and nothing more after them.
> I had problems with a broken UEFI Bios on a Thinkpad.

Re: UEFI issues booting into FAI sysinfo on Dell PowerEdge

2018-09-03 Thread Steffen Grunewald

On Mon, 2018-09-03 at 15:06:24 +0200, Rémy Dernat wrote:
> Hi,
> 
> I suggest you to retrieve the version of SYSLINUX 6.04; look here for more
> informations :
> https://groups.google.com/a/lbl.gov/forum/#!msg/warewulf/klTLgX-L4nw/IJZo3-jgAAAJ

I'm stuck with 6.03 right now.
(That thread is yet another place suggesting to save memory by going for legacy 
boot!)

> BTW, sometimes an update of the firmware on the client give great results.

The machines are brand-new... I doubt that there's a BIOS update yet, and of 
course
if there was it'd be nice to apply it over the network...

Thanks,
 S

Re: UEFI issues booting into FAI sysinfo on Dell PowerEdge

2018-09-03 Thread Steffen Grunewald

On Mon, 2018-09-03 at 15:41:07 +0200, Thomas Lange wrote:
> >>>>> On Mon, 3 Sep 2018 15:31:44 +0200, Steffen Grunewald 
> >>>>>  said:
> 
> > On Mon, 2018-09-03 at 14:44:54 +0200, Thomas Lange wrote:
> > I remember that the "legacy" sysinfo run also took a long while to get 
> started,
> > so it might be hardware-specific.
> If the boot/startup process of FAI takes a long time, it's maybe this issue:
> https://github.com/faiproject/fai/commit/42abe35ed4d5a8c6f63e3bd334f4ab7339e
> 
> Without this fix, rsyslogd may need a long time to start is some situations.

Interesting, but I don't see this on other machines - and my FAI version has 
this
fix already incorporated (5.7 from the koeln repository).
I guess the delay is caused by the large initrd (30 MB).
I'm lucky not to be bitten by the watchdog then...

- S

Re: UEFI issues booting into FAI sysinfo on Dell PowerEdge

2018-09-03 Thread Steffen Grunewald

On Mon, 2018-09-03 at 15:43:47 +0200, Thomas Lange wrote:
> >>>>> On Mon, 3 Sep 2018 15:34:34 +0200, Steffen Grunewald 
> >>>>>  said:
> 
> >> I suggest you to retrieve the version of SYSLINUX 6.04; look here for 
> more
> >> informations :
> >> 
> https://groups.google.com/a/lbl.gov/forum/#!msg/warewulf/klTLgX-L4nw/IJZo3-jgAAAJ
> 
> Now I rembember that not the BIOS update fixed my Thinkpad problem,
> but using a newer syslinux.efi from syslinux 6.04 was the proper fix.

Will try that next. (For whatever reason there was no update of syslinux
in Debian for years... not even an backport. Stretch is growing old already.)

Thanks, S

Resolved using syslinux 6.04~, was: Re: UEFI issues booting into FAI sysinfo on Dell PowerEdge

2018-09-11 Thread Steffen Grunewald

On Mon, 2018-09-03 at 16:21:36 +0200, Steffen Grunewald wrote:
> On Mon, 2018-09-03 at 15:43:47 +0200, Thomas Lange wrote:
> > >>>>> On Mon, 3 Sep 2018 15:34:34 +0200, Steffen Grunewald 
> > >>>>>  said:
> > 
> > >> I suggest you to retrieve the version of SYSLINUX 6.04; look here 
> > for more
> > >> informations :
> > >> 
> > https://groups.google.com/a/lbl.gov/forum/#!msg/warewulf/klTLgX-L4nw/IJZo3-jgAAAJ
> > 
> > Now I rembember that not the BIOS update fixed my Thinkpad problem,
> > but using a newer syslinux.efi from syslinux 6.04 was the proper fix.
> 
> Will try that next. (For whatever reason there was no update of syslinux
> in Debian for years... not even an backport. Stretch is growing old already.)

I owe you a report - a success report, that is.
Using the files from syslinux-{common,efi}_6.04~git (as recommended by Thomas,
already on July 11 - I must have missed that during my holidays), the boot delay
almost vanished. No issues booting the Dell machine, and I also succeeded with
an AMD Epyc one.
For the latter, I found that the UEFI mode results in 7 *more* available memory
pages than the LEGACY one (unless my counting algorithm is faulty).
Not too big a difference, and given that I had to flip seven switches to go UEFI
(boot mode, and 6 PCI-E OPROM modes), I'm still hesitant to enforce UEFI, but
at least now I know that UEFI-only hardware won't be a showstopper anymore.
(Got to think about adding a /boot/efi partition to all relevant 
disk_configs...)

Thanks for all your suggestions. 
Now back to the preparations for the new setup (delivery will be in November,
but I've got some hardware to test before)...

- Steffen

1 2 >

1 - 100 of 142 matches

Mail list logo