PXE boot is an infinite reinstall

2011-10-17 Thread ~Stack~
Hello All,

I ran into another issue with my PXE build out. I searched the net and
found many people with the same issue, but there was either no response
or their solution would not work for my needs (requiring access to
software I don't have). What I am after with this is a completely
unmanaged automated install of a client on boot.

I am using dnsmasq as my DNS, DHCP, and TFTP server.

I have a server and a client. The client boots off the network card with
PXE. It asks for and receives a IP from the DHCP server and proceeds
with pulling the TFTP information. The TFTP server passes it a
pxelinux.0 file along with the "default" configuration. The
configuration has a kickstart file and the client continues with a
flawless install of SL6.1. After the install, the client reboots...and
the whole process starts over and over and over again. I know why it
does this (the default boot option is to install), but I can't figure
out how to control it.

What I would like is a process where I boot the clients from an off
state, have them do a fresh install, and then reboot into the new
install. Nothing is stored on these nodes and a fresh install goes
rather quickly so I don't mind this option.

At first I tried scripting an option that just toggled the tftp default
menu but it wasn't working very smoothly as not all my hosts boot at
equal speeds.

I attempted chainloading in the tftp but just made a mess and I didn't
get any different results. Most likely due to me not understanding it
properly. I am open to pointers.

I thought I could do it inside of DNSMasq, but I couldn't find a good
example and my attempts didn't work.

I looked online and found projects like systemimager.org but I am
already doing most of what they provide. I attempted to reverse their
perl scripts but that is a bigger project then I initially thought. What
I did like about this project was the ability to tell it to allow a
single host or a group of hosts to reinstall or to boot off the hard disk.

I have gotten some great pointers from this list so far and I am really
hoping someone might have another for me. Any ideas?

Thanks!

~Stack~


Re: PXE boot is an infinite reinstall

2011-10-17 Thread Orion Poplawski

On 10/17/2011 04:05 PM, ~Stack~ wrote:

I have gotten some great pointers from this list so far and I am really
hoping someone might have another for me. Any ideas?


I use cobbler.  I used to do my own but had to manage the booting manually as 
you have found.


--
Orion Poplawski
Technical Manager 303-415-9701 x222
NWRA/CoRA DivisionFAX: 303-415-9702
3380 Mitchell Lane  or...@cora.nwra.com
Boulder, CO 80301  http://www.cora.nwra.com


Re: PXE boot is an infinite reinstall

2011-10-17 Thread Steven Timm

The trick that Rocks uses is to have a boot order of (hard disk, pxe)
and then when you want to reinstall, change two bytes in the
boot sector to make the hard disk unbootable and it will fall through
to a PXE boot only at that time.

What worker node installs at Fermilab do is to have a DHCP server that
only answers the PXE request when you want to reinstall, and no other
time, so the PXE request just times out and then you boot off the hard 
drive.


Steve

On Mon, 17 Oct 2011, ~Stack~ wrote:


Hello All,

I ran into another issue with my PXE build out. I searched the net and
found many people with the same issue, but there was either no response
or their solution would not work for my needs (requiring access to
software I don't have). What I am after with this is a completely
unmanaged automated install of a client on boot.

I am using dnsmasq as my DNS, DHCP, and TFTP server.

I have a server and a client. The client boots off the network card with
PXE. It asks for and receives a IP from the DHCP server and proceeds
with pulling the TFTP information. The TFTP server passes it a
pxelinux.0 file along with the "default" configuration. The
configuration has a kickstart file and the client continues with a
flawless install of SL6.1. After the install, the client reboots...and
the whole process starts over and over and over again. I know why it
does this (the default boot option is to install), but I can't figure
out how to control it.

What I would like is a process where I boot the clients from an off
state, have them do a fresh install, and then reboot into the new
install. Nothing is stored on these nodes and a fresh install goes
rather quickly so I don't mind this option.

At first I tried scripting an option that just toggled the tftp default
menu but it wasn't working very smoothly as not all my hosts boot at
equal speeds.

I attempted chainloading in the tftp but just made a mess and I didn't
get any different results. Most likely due to me not understanding it
properly. I am open to pointers.

I thought I could do it inside of DNSMasq, but I couldn't find a good
example and my attempts didn't work.

I looked online and found projects like systemimager.org but I am
already doing most of what they provide. I attempted to reverse their
perl scripts but that is a bigger project then I initially thought. What
I did like about this project was the ability to tell it to allow a
single host or a group of hosts to reinstall or to boot off the hard disk.

I have gotten some great pointers from this list so far and I am really
hoping someone might have another for me. Any ideas?

Thanks!

~Stack~



--
--
Steven C. Timm, Ph.D  (630) 840-8525
t...@fnal.gov  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Group Leader.
Lead of FermiCloud project.


Re: PXE boot is an infinite reinstall

2011-10-17 Thread ~Stack~
On 10/17/2011 06:26 PM, Steven Timm wrote:
> The trick that Rocks uses is to have a boot order of (hard disk, pxe)
> and then when you want to reinstall, change two bytes in the
> boot sector to make the hard disk unbootable and it will fall through
> to a PXE boot only at that time.

That is actually a really good idea now that I think about it. It has
been a few years since I used Rocks. I am really glad they are still
around. I just looked and saw they are still RHEL 5 based and not 6
(which I need). Oh well.

I have been thinking about this for a short while now, and I already
think I know how to script this to work the way I want it to.

Thanks for the suggestion!

~Stack~


Re: PXE boot is an infinite reinstall

2011-10-17 Thread Sergio Ballestrero
 What we (CERN ATLAS Online) do is to have a PXE default config that points 
back to localboot, and is changed for the specific PC when you want to 
reinstall. The %postboot of the kickstart then has to contain a call to a CGI 
(e.g. using wget) that resets the PXE to the default. The same kind of 
mechanism is used by Cobbler and by Quattor's PXE install system. This is 
faster than letting PXE timeout.

The Rocks trick is interesting and possibly the safest way to do it. I see it 
may have the disadvantage that you can't reinstall from remote a PC that is 
unable to boot - unless your PCs have a "lights-out" remote management that 
allows you to change the boot order.

Cheers,
  Sergio

On 18 Oct 2011, at 01:26, Steven Timm wrote:

> The trick that Rocks uses is to have a boot order of (hard disk, pxe)
> and then when you want to reinstall, change two bytes in the
> boot sector to make the hard disk unbootable and it will fall through
> to a PXE boot only at that time.
> 
> What worker node installs at Fermilab do is to have a DHCP server that
> only answers the PXE request when you want to reinstall, and no other
> time, so the PXE request just times out and then you boot off the hard drive.
> 
> Steve
> 
> On Mon, 17 Oct 2011, ~Stack~ wrote:
> 
>> Hello All,
>> 
>> I ran into another issue with my PXE build out. I searched the net and
>> found many people with the same issue, but there was either no response
>> or their solution would not work for my needs (requiring access to
>> software I don't have). What I am after with this is a completely
>> unmanaged automated install of a client on boot.
>> 
>> I am using dnsmasq as my DNS, DHCP, and TFTP server.
>> 
>> I have a server and a client. The client boots off the network card with
>> PXE. It asks for and receives a IP from the DHCP server and proceeds
>> with pulling the TFTP information. The TFTP server passes it a
>> pxelinux.0 file along with the "default" configuration. The
>> configuration has a kickstart file and the client continues with a
>> flawless install of SL6.1. After the install, the client reboots...and
>> the whole process starts over and over and over again. I know why it
>> does this (the default boot option is to install), but I can't figure
>> out how to control it.
>> 
>> What I would like is a process where I boot the clients from an off
>> state, have them do a fresh install, and then reboot into the new
>> install. Nothing is stored on these nodes and a fresh install goes
>> rather quickly so I don't mind this option.
>> 
>> At first I tried scripting an option that just toggled the tftp default
>> menu but it wasn't working very smoothly as not all my hosts boot at
>> equal speeds.
>> 
>> I attempted chainloading in the tftp but just made a mess and I didn't
>> get any different results. Most likely due to me not understanding it
>> properly. I am open to pointers.
>> 
>> I thought I could do it inside of DNSMasq, but I couldn't find a good
>> example and my attempts didn't work.
>> 
>> I looked online and found projects like systemimager.org but I am
>> already doing most of what they provide. I attempted to reverse their
>> perl scripts but that is a bigger project then I initially thought. What
>> I did like about this project was the ability to tell it to allow a
>> single host or a group of hosts to reinstall or to boot off the hard disk.
>> 
>> I have gotten some great pointers from this list so far and I am really
>> hoping someone might have another for me. Any ideas?
>> 
>> Thanks!
>> 
>> ~Stack~
>> 
> 
> -- 
> --
> Steven C. Timm, Ph.D  (630) 840-8525
> t...@fnal.gov  http://home.fnal.gov/~timm/
> Fermilab Computing Division, Scientific Computing Facilities,
> Grid Facilities Department, FermiGrid Services Group, Group Leader.
> Lead of FermiCloud project.

-- 
 Sergio Ballestrero  - http://physics.uj.ac.za/psiwiki/Ballestrero
 University of Johannesburg, Physics Department
 ATLAS TDAQ sysadmin group - Office:75282 OnCall:164851


Re: PXE boot is an infinite reinstall

2011-10-18 Thread Yannick Perret

Steven Timm a écrit :

The trick that Rocks uses is to have a boot order of (hard disk, pxe)
and then when you want to reinstall, change two bytes in the
boot sector to make the hard disk unbootable and it will fall through
to a PXE boot only at that time.

What worker node installs at Fermilab do is to have a DHCP server that
only answers the PXE request when you want to reinstall, and no other
time, so the PXE request just times out and then you boot off the hard 
drive.



Here (at CC-IN2P3) we do mostly the same: boot sequence HDD;PXE.

Destroying partition table works. We also use IPMI. Using IPMI commands 
(if your nodes have a IPMI-compatible card) you can use "chassis bootdev 
pxe", whitch tells the node to boot on PXE only the next time.
So reinstalling a node (with a configured IMPI) consists in "chassis 
bootdev pxe" + "chassis power [cycle|on]".


Regards,
--
Y.

Steve

On Mon, 17 Oct 2011, ~Stack~ wrote:


Hello All,

I ran into another issue with my PXE build out. I searched the net and
found many people with the same issue, but there was either no response
or their solution would not work for my needs (requiring access to
software I don't have). What I am after with this is a completely
unmanaged automated install of a client on boot.

I am using dnsmasq as my DNS, DHCP, and TFTP server.

I have a server and a client. The client boots off the network card with
PXE. It asks for and receives a IP from the DHCP server and proceeds
with pulling the TFTP information. The TFTP server passes it a
pxelinux.0 file along with the "default" configuration. The
configuration has a kickstart file and the client continues with a
flawless install of SL6.1. After the install, the client reboots...and
the whole process starts over and over and over again. I know why it
does this (the default boot option is to install), but I can't figure
out how to control it.

What I would like is a process where I boot the clients from an off
state, have them do a fresh install, and then reboot into the new
install. Nothing is stored on these nodes and a fresh install goes
rather quickly so I don't mind this option.

At first I tried scripting an option that just toggled the tftp default
menu but it wasn't working very smoothly as not all my hosts boot at
equal speeds.

I attempted chainloading in the tftp but just made a mess and I didn't
get any different results. Most likely due to me not understanding it
properly. I am open to pointers.

I thought I could do it inside of DNSMasq, but I couldn't find a good
example and my attempts didn't work.

I looked online and found projects like systemimager.org but I am
already doing most of what they provide. I attempted to reverse their
perl scripts but that is a bigger project then I initially thought. What
I did like about this project was the ability to tell it to allow a
single host or a group of hosts to reinstall or to boot off the hard 
disk.


I have gotten some great pointers from this list so far and I am really
hoping someone might have another for me. Any ideas?

Thanks!

~Stack~





Re: PXE boot is an infinite reinstall

2011-10-18 Thread Felip Moll
If you have Dell Servers there is an option that you can change through
iDrac interface that is "Boot once". You check Boot once with PXE and then
reboot the machine. It will boot from PXE only one time so when rebooting
will go throught the HD.

Regards
2011/10/18 Yannick Perret 

> Steven Timm a écrit :
>
>  The trick that Rocks uses is to have a boot order of (hard disk, pxe)
>> and then when you want to reinstall, change two bytes in the
>> boot sector to make the hard disk unbootable and it will fall through
>> to a PXE boot only at that time.
>>
>> What worker node installs at Fermilab do is to have a DHCP server that
>> only answers the PXE request when you want to reinstall, and no other
>> time, so the PXE request just times out and then you boot off the hard
>> drive.
>>
>>  Here (at CC-IN2P3) we do mostly the same: boot sequence HDD;PXE.
>
> Destroying partition table works. We also use IPMI. Using IPMI commands (if
> your nodes have a IPMI-compatible card) you can use "chassis bootdev pxe",
> whitch tells the node to boot on PXE only the next time.
> So reinstalling a node (with a configured IMPI) consists in "chassis
> bootdev pxe" + "chassis power [cycle|on]".
>
> Regards,
> --
>
> Y.
>
>> Steve
>>
>> On Mon, 17 Oct 2011, ~Stack~ wrote:
>>
>>  Hello All,
>>>
>>> I ran into another issue with my PXE build out. I searched the net and
>>> found many people with the same issue, but there was either no response
>>> or their solution would not work for my needs (requiring access to
>>> software I don't have). What I am after with this is a completely
>>> unmanaged automated install of a client on boot.
>>>
>>> I am using dnsmasq as my DNS, DHCP, and TFTP server.
>>>
>>> I have a server and a client. The client boots off the network card with
>>> PXE. It asks for and receives a IP from the DHCP server and proceeds
>>> with pulling the TFTP information. The TFTP server passes it a
>>> pxelinux.0 file along with the "default" configuration. The
>>> configuration has a kickstart file and the client continues with a
>>> flawless install of SL6.1. After the install, the client reboots...and
>>> the whole process starts over and over and over again. I know why it
>>> does this (the default boot option is to install), but I can't figure
>>> out how to control it.
>>>
>>> What I would like is a process where I boot the clients from an off
>>> state, have them do a fresh install, and then reboot into the new
>>> install. Nothing is stored on these nodes and a fresh install goes
>>> rather quickly so I don't mind this option.
>>>
>>> At first I tried scripting an option that just toggled the tftp default
>>> menu but it wasn't working very smoothly as not all my hosts boot at
>>> equal speeds.
>>>
>>> I attempted chainloading in the tftp but just made a mess and I didn't
>>> get any different results. Most likely due to me not understanding it
>>> properly. I am open to pointers.
>>>
>>> I thought I could do it inside of DNSMasq, but I couldn't find a good
>>> example and my attempts didn't work.
>>>
>>> I looked online and found projects like systemimager.org but I am
>>> already doing most of what they provide. I attempted to reverse their
>>> perl scripts but that is a bigger project then I initially thought. What
>>> I did like about this project was the ability to tell it to allow a
>>> single host or a group of hosts to reinstall or to boot off the hard
>>> disk.
>>>
>>> I have gotten some great pointers from this list so far and I am really
>>> hoping someone might have another for me. Any ideas?
>>>
>>> Thanks!
>>>
>>> ~Stack~
>>>
>>>
>>