Need help recovering SL4 /boot (not backed up)

2010-12-22 Thread Pann McCuaig
Greetings SL fans,

Sorry for the length of this post, but I'm hoping someone can come to my
rescue and want to provide sufficient context.

Recently I had a hard drive failure on a Sun X4600 box running SL4.8.
The box has four drives; the drive that failed was /boot (only). The
other three drives make up /dev/md0.

The /boot drive was not backed up (headsmack!).

I have created a rescue USB stick based on the System Rescue CD, which
boots via grub. I can boot the System Rescue CD successfully, and it
sees /dev/md0, which I can then mount and read from and write to.

I have attempted to create a kernel and initrd image to add to the USB
stick that will boot the box as if the kernel and initrd image were on
the failed (and now removed) boot drive.

I built the kernel and initrd image on a box similar to the box with the
failed hard drive. I replaced /etc/modprobe.conf with the file from the
target server, and then did 'yum install kernel-largesmp'. I copied all
the resulting kernel-related files from /boot to the USB stick, as well
as the appropriate directory from /lib/modules.

I then restored the helper box to its original state.

I booted the System Rescue CD on the target system and copied the
/lib/modules directory into place on /dev/md0. I fixed up grub/menu.lst
to have a stanza to boot the newly created SL4 kernel. I rebooted the
box, and everything seemed to be going swimmingly, until . . .

. . . the booted kernel seems unable to build /dev/md0 and the boot
process fails.

In the original configuration, the boot drive was /dev/sda, and the
drives making up the soft RAID partition were /dev/sdb, dev/sdc, and
/dev/sdd.

The System Rescue CD detects the USB stick as /dev/sda and the three SAS
drives as sdb, sdc, and sdd. All is well.

It's not clear to me what is going awry with the SL kernel, but as
the boot verbiage scrolls by, I see /dev/sdc referenced twice, and no
reference to /dev/sdd. When the kernel attempts to assemble /dev/md0, it
uses /dev/sda, /dev/sdb, and /dev/sdc and this fails and /dev/md0 cannot
be mounted and the kernel panics.

Help, please. Suggestions? Thanks.

BTW, I've put both SL4.8 Disc One, and the SL4.8 Live CD on a bootable
USB stick; both boot successfully, but I was unable to find a way to
make either one recognize /dev/md0, much less "rescue" me.

Cheers,
 Pann
-- 
Pann McCuaig 212-854-8689
Systems Coordinator, Economics Department, Columbia University
Department Computing Resources:
   http://www.columbia.edu/cu/economics/computing/


Re: Need help recovering SL4 /boot (not backed up)

2010-12-22 Thread Mark Stodola

Pann McCuaig wrote:

Greetings SL fans,

Sorry for the length of this post, but I'm hoping someone can come to my
rescue and want to provide sufficient context.

Recently I had a hard drive failure on a Sun X4600 box running SL4.8.
The box has four drives; the drive that failed was /boot (only). The
other three drives make up /dev/md0.

The /boot drive was not backed up (headsmack!).

I have created a rescue USB stick based on the System Rescue CD, which
boots via grub. I can boot the System Rescue CD successfully, and it
sees /dev/md0, which I can then mount and read from and write to.

I have attempted to create a kernel and initrd image to add to the USB
stick that will boot the box as if the kernel and initrd image were on
the failed (and now removed) boot drive.

I built the kernel and initrd image on a box similar to the box with the
failed hard drive. I replaced /etc/modprobe.conf with the file from the
target server, and then did 'yum install kernel-largesmp'. I copied all
the resulting kernel-related files from /boot to the USB stick, as well
as the appropriate directory from /lib/modules.

I then restored the helper box to its original state.

I booted the System Rescue CD on the target system and copied the
/lib/modules directory into place on /dev/md0. I fixed up grub/menu.lst
to have a stanza to boot the newly created SL4 kernel. I rebooted the
box, and everything seemed to be going swimmingly, until . . .

. . . the booted kernel seems unable to build /dev/md0 and the boot
process fails.

In the original configuration, the boot drive was /dev/sda, and the
drives making up the soft RAID partition were /dev/sdb, dev/sdc, and
/dev/sdd.

The System Rescue CD detects the USB stick as /dev/sda and the three SAS
drives as sdb, sdc, and sdd. All is well.

It's not clear to me what is going awry with the SL kernel, but as
the boot verbiage scrolls by, I see /dev/sdc referenced twice, and no
reference to /dev/sdd. When the kernel attempts to assemble /dev/md0, it
uses /dev/sda, /dev/sdb, and /dev/sdc and this fails and /dev/md0 cannot
be mounted and the kernel panics.

Help, please. Suggestions? Thanks.

BTW, I've put both SL4.8 Disc One, and the SL4.8 Live CD on a bootable
USB stick; both boot successfully, but I was unable to find a way to
make either one recognize /dev/md0, much less "rescue" me.

Cheers,
 Pann
  
If all you lost was /boot, you can probably boot off the install media 
in rescue mode so it mounts things, chroot to /mnt/sysimage/, then 
reinstall the kernel and grub packages as appropriate, make sure grub is 
installed to the MBR or whatever means you use to boot, and reboot.


-Mark

--
Mr. Mark V. Stodola
Digital Systems Engineer

National Electrostatics Corp.
P.O. Box 620310
Middleton, WI 53562-0310 USA
Phone: (608) 831-7600
Fax: (608) 831-9591


Re: Need help recovering SL4 /boot (not backed up)

2010-12-22 Thread Phong Nguyen
Have you tried determining what's in mdadm.conf in the initrd file? It might be 
getting some incorrect assembly instructions for md0. 

On 22 Dec 2010, at 1203, Pann McCuaig wrote:

> Greetings SL fans,
> 
> Sorry for the length of this post, but I'm hoping someone can come to my
> rescue and want to provide sufficient context.
> 
> Recently I had a hard drive failure on a Sun X4600 box running SL4.8.
> The box has four drives; the drive that failed was /boot (only). The
> other three drives make up /dev/md0.
> 
> The /boot drive was not backed up (headsmack!).
> 
> I have created a rescue USB stick based on the System Rescue CD, which
> boots via grub. I can boot the System Rescue CD successfully, and it
> sees /dev/md0, which I can then mount and read from and write to.
> 
> I have attempted to create a kernel and initrd image to add to the USB
> stick that will boot the box as if the kernel and initrd image were on
> the failed (and now removed) boot drive.
> 
> I built the kernel and initrd image on a box similar to the box with the
> failed hard drive. I replaced /etc/modprobe.conf with the file from the
> target server, and then did 'yum install kernel-largesmp'. I copied all
> the resulting kernel-related files from /boot to the USB stick, as well
> as the appropriate directory from /lib/modules.
> 
> I then restored the helper box to its original state.
> 
> I booted the System Rescue CD on the target system and copied the
> /lib/modules directory into place on /dev/md0. I fixed up grub/menu.lst
> to have a stanza to boot the newly created SL4 kernel. I rebooted the
> box, and everything seemed to be going swimmingly, until . . .
> 
> . . . the booted kernel seems unable to build /dev/md0 and the boot
> process fails.
> 
> In the original configuration, the boot drive was /dev/sda, and the
> drives making up the soft RAID partition were /dev/sdb, dev/sdc, and
> /dev/sdd.
> 
> The System Rescue CD detects the USB stick as /dev/sda and the three SAS
> drives as sdb, sdc, and sdd. All is well.
> 
> It's not clear to me what is going awry with the SL kernel, but as
> the boot verbiage scrolls by, I see /dev/sdc referenced twice, and no
> reference to /dev/sdd. When the kernel attempts to assemble /dev/md0, it
> uses /dev/sda, /dev/sdb, and /dev/sdc and this fails and /dev/md0 cannot
> be mounted and the kernel panics.
> 
> Help, please. Suggestions? Thanks.
> 
> BTW, I've put both SL4.8 Disc One, and the SL4.8 Live CD on a bootable
> USB stick; both boot successfully, but I was unable to find a way to
> make either one recognize /dev/md0, much less "rescue" me.
> 
> Cheers,
> Pann
> -- 
> Pann McCuaig 212-854-8689
> Systems Coordinator, Economics Department, Columbia University
> Department Computing Resources:
>   http://www.columbia.edu/cu/economics/computing/



smime.p7s
Description: S/MIME cryptographic signature


Re: Need help recovering SL4 /boot (not backed up)

2010-12-22 Thread Pann McCuaig
On Wed, Dec 22, 2010 at 13:00, Phong Nguyen wrote:

> Have you tried determining what's in mdadm.conf in the initrd file? It
> might be getting some incorrect assembly instructions for md0.

This may well be the issue. Other than /etc/modprobe.conf (which is
obvious from the mkinitrd man page) and /etc/mdadm.conf, what does the
initrd file look at (or where can I find out)?


> On 22 Dec 2010, at 1203, Pann McCuaig wrote:
> 
> > Greetings SL fans,
> > 
> > Sorry for the length of this post, but I'm hoping someone can come to my
> > rescue and want to provide sufficient context.
> > 
> > Recently I had a hard drive failure on a Sun X4600 box running SL4.8.
> > The box has four drives; the drive that failed was /boot (only). The
> > other three drives make up /dev/md0.
> > 
> > The /boot drive was not backed up (headsmack!).
> > 
> > I have created a rescue USB stick based on the System Rescue CD, which
> > boots via grub. I can boot the System Rescue CD successfully, and it
> > sees /dev/md0, which I can then mount and read from and write to.
> > 
> > I have attempted to create a kernel and initrd image to add to the USB
> > stick that will boot the box as if the kernel and initrd image were on
> > the failed (and now removed) boot drive.
> > 
> > I built the kernel and initrd image on a box similar to the box with the
> > failed hard drive. I replaced /etc/modprobe.conf with the file from the
> > target server, and then did 'yum install kernel-largesmp'. I copied all
> > the resulting kernel-related files from /boot to the USB stick, as well
> > as the appropriate directory from /lib/modules.
> > 
> > I then restored the helper box to its original state.
> > 
> > I booted the System Rescue CD on the target system and copied the
> > /lib/modules directory into place on /dev/md0. I fixed up grub/menu.lst
> > to have a stanza to boot the newly created SL4 kernel. I rebooted the
> > box, and everything seemed to be going swimmingly, until . . .
> > 
> > . . . the booted kernel seems unable to build /dev/md0 and the boot
> > process fails.
> > 
> > In the original configuration, the boot drive was /dev/sda, and the
> > drives making up the soft RAID partition were /dev/sdb, dev/sdc, and
> > /dev/sdd.
> > 
> > The System Rescue CD detects the USB stick as /dev/sda and the three SAS
> > drives as sdb, sdc, and sdd. All is well.
> > 
> > It's not clear to me what is going awry with the SL kernel, but as
> > the boot verbiage scrolls by, I see /dev/sdc referenced twice, and no
> > reference to /dev/sdd. When the kernel attempts to assemble /dev/md0, it
> > uses /dev/sda, /dev/sdb, and /dev/sdc and this fails and /dev/md0 cannot
> > be mounted and the kernel panics.
> > 
> > Help, please. Suggestions? Thanks.
> > 
> > BTW, I've put both SL4.8 Disc One, and the SL4.8 Live CD on a bootable
> > USB stick; both boot successfully, but I was unable to find a way to
> > make either one recognize /dev/md0, much less "rescue" me.
> > 
> > Cheers,
> > Pann

-- 
Pann McCuaig 212-854-8689
Systems Coordinator, Economics Department, Columbia University
Department Computing Resources:
   http://www.columbia.edu/cu/economics/computing/


Re: Need help recovering SL4 /boot (not backed up)

2010-12-22 Thread Phong Nguyen

On 22 Dec 2010, at 1342, Pann McCuaig wrote:

> On Wed, Dec 22, 2010 at 13:00, Phong Nguyen wrote:
> 
>> Have you tried determining what's in mdadm.conf in the initrd file? It
>> might be getting some incorrect assembly instructions for md0.
> 
> This may well be the issue. Other than /etc/modprobe.conf (which is
> obvious from the mkinitrd man page) and /etc/mdadm.conf, what does the
> initrd file look at (or where can I find out)?

You can extract the initrd with something like the following: 

gzip -dc initrd-file | cpio -id

smime.p7s
Description: S/MIME cryptographic signature


Re: Need help recovering SL4 /boot (not backed up)

2010-12-22 Thread Larry Linder
Advice: Buy a new big disk and use rsync to make a copy of disks.
run it every night using cron.

Our insurance agent has us set him up with a back up on separate computer.
Theives broke into his office and took everyting.  If we hadn't put his old 
disks on the self - he would be out of business.

This is where it gets interesting.   If it is like SL5.6 you are done because 
you can't become root due to some problem in the code.   All you get is a 
couple of messages.   This was a problem I have had due to disk failures in 
the last year and on my wish list for 6.0. 

I really think this needs to be fixed.  

Every Unix and Linux system I have used in the last 30 years you were able to 
become root in a system error.   Even System V.  

If you are in a corner you can fix almost any problem but you need to be able 
to set user to root.

Good Luck

Larry Linder

On Wednesday 22 December 2010 2:42 pm, Pann McCuaig wrote:
> On Wed, Dec 22, 2010 at 13:00, Phong Nguyen wrote:
> > Have you tried determining what's in mdadm.conf in the initrd file? It
> > might be getting some incorrect assembly instructions for md0.
>
> This may well be the issue. Other than /etc/modprobe.conf (which is
> obvious from the mkinitrd man page) and /etc/mdadm.conf, what does the
> initrd file look at (or where can I find out)?
>
> > On 22 Dec 2010, at 1203, Pann McCuaig wrote:
> > > Greetings SL fans,
> > >
> > > Sorry for the length of this post, but I'm hoping someone can come to
> > > my rescue and want to provide sufficient context.
> > >
> > > Recently I had a hard drive failure on a Sun X4600 box running SL4.8.
> > > The box has four drives; the drive that failed was /boot (only). The
> > > other three drives make up /dev/md0.
> > >
> > > The /boot drive was not backed up (headsmack!).
> > >
> > > I have created a rescue USB stick based on the System Rescue CD, which
> > > boots via grub. I can boot the System Rescue CD successfully, and it
> > > sees /dev/md0, which I can then mount and read from and write to.
> > >
> > > I have attempted to create a kernel and initrd image to add to the USB
> > > stick that will boot the box as if the kernel and initrd image were on
> > > the failed (and now removed) boot drive.
> > >
> > > I built the kernel and initrd image on a box similar to the box with
> > > the failed hard drive. I replaced /etc/modprobe.conf with the file from
> > > the target server, and then did 'yum install kernel-largesmp'. I copied
> > > all the resulting kernel-related files from /boot to the USB stick, as
> > > well as the appropriate directory from /lib/modules.
> > >
> > > I then restored the helper box to its original state.
> > >
> > > I booted the System Rescue CD on the target system and copied the
> > > /lib/modules directory into place on /dev/md0. I fixed up grub/menu.lst
> > > to have a stanza to boot the newly created SL4 kernel. I rebooted the
> > > box, and everything seemed to be going swimmingly, until . . .
> > >
> > > . . . the booted kernel seems unable to build /dev/md0 and the boot
> > > process fails.
> > >
> > > In the original configuration, the boot drive was /dev/sda, and the
> > > drives making up the soft RAID partition were /dev/sdb, dev/sdc, and
> > > /dev/sdd.
> > >
> > > The System Rescue CD detects the USB stick as /dev/sda and the three
> > > SAS drives as sdb, sdc, and sdd. All is well.
> > >
> > > It's not clear to me what is going awry with the SL kernel, but as
> > > the boot verbiage scrolls by, I see /dev/sdc referenced twice, and no
> > > reference to /dev/sdd. When the kernel attempts to assemble /dev/md0,
> > > it uses /dev/sda, /dev/sdb, and /dev/sdc and this fails and /dev/md0
> > > cannot be mounted and the kernel panics.
> > >
> > > Help, please. Suggestions? Thanks.
> > >
> > > BTW, I've put both SL4.8 Disc One, and the SL4.8 Live CD on a bootable
> > > USB stick; both boot successfully, but I was unable to find a way to
> > > make either one recognize /dev/md0, much less "rescue" me.
> > >
> > > Cheers,
> > > Pann


RE: Need help recovering SL4 /boot (not backed up)

2010-12-22 Thread Kinzel, David
>Advice: Buy a new big disk and use rsync to make a copy of disks.
>run it every night using cron.
>
>Our insurance agent has us set him up with a back up on 
>separate computer.
>Theives broke into his office and took everyting.  If we 
>hadn't put his old 
>disks on the self - he would be out of business.
>
>This is where it gets interesting.   If it is like SL5.6 you 
>are done because 
>you can't become root due to some problem in the code.   All 
>you get is a 
>couple of messages.   This was a problem I have had due to 
>disk failures in 
>the last year and on my wish list for 6.0. 
>

Can you explain what you mean by this?

>I really think this needs to be fixed.  
>
>Every Unix and Linux system I have used in the last 30 years 
>you were able to 
>become root in a system error.   Even System V.  
>
>If you are in a corner you can fix almost any problem but you 
>need to be able 
>to set user to root.
>
>Good Luck
>
>Larry Linder
>
>On Wednesday 22 December 2010 2:42 pm, Pann McCuaig wrote:
>> On Wed, Dec 22, 2010 at 13:00, Phong Nguyen wrote:
>> > Have you tried determining what's in mdadm.conf in the 
>initrd file? It
>> > might be getting some incorrect assembly instructions for md0.
>>
>> This may well be the issue. Other than /etc/modprobe.conf (which is
>> obvious from the mkinitrd man page) and /etc/mdadm.conf, 
>what does the
>> initrd file look at (or where can I find out)?
>>
>> > On 22 Dec 2010, at 1203, Pann McCuaig wrote:
>> > > Greetings SL fans,
>> > >
>> > > Sorry for the length of this post, but I'm hoping 
>someone can come to
>> > > my rescue and want to provide sufficient context.
>> > >
>> > > Recently I had a hard drive failure on a Sun X4600 box 
>running SL4.8.
>> > > The box has four drives; the drive that failed was /boot 
>(only). The
>> > > other three drives make up /dev/md0.
>> > >
>> > > The /boot drive was not backed up (headsmack!).
>> > >
>> > > I have created a rescue USB stick based on the System 
>Rescue CD, which
>> > > boots via grub. I can boot the System Rescue CD 
>successfully, and it
>> > > sees /dev/md0, which I can then mount and read from and write to.
>> > >
>> > > I have attempted to create a kernel and initrd image to 
>add to the USB
>> > > stick that will boot the box as if the kernel and initrd 
>image were on
>> > > the failed (and now removed) boot drive.
>> > >
>> > > I built the kernel and initrd image on a box similar to 
>the box with
>> > > the failed hard drive. I replaced /etc/modprobe.conf 
>with the file from
>> > > the target server, and then did 'yum install 
>kernel-largesmp'. I copied
>> > > all the resulting kernel-related files from /boot to the 
>USB stick, as
>> > > well as the appropriate directory from /lib/modules.
>> > >
>> > > I then restored the helper box to its original state.
>> > >
>> > > I booted the System Rescue CD on the target system and copied the
>> > > /lib/modules directory into place on /dev/md0. I fixed 
>up grub/menu.lst
>> > > to have a stanza to boot the newly created SL4 kernel. I 
>rebooted the
>> > > box, and everything seemed to be going swimmingly, until . . .
>> > >
>> > > . . . the booted kernel seems unable to build /dev/md0 
>and the boot
>> > > process fails.
>> > >
>> > > In the original configuration, the boot drive was 
>/dev/sda, and the
>> > > drives making up the soft RAID partition were /dev/sdb, 
>dev/sdc, and
>> > > /dev/sdd.
>> > >
>> > > The System Rescue CD detects the USB stick as /dev/sda 
>and the three
>> > > SAS drives as sdb, sdc, and sdd. All is well.
>> > >
>> > > It's not clear to me what is going awry with the SL 
>kernel, but as
>> > > the boot verbiage scrolls by, I see /dev/sdc referenced 
>twice, and no
>> > > reference to /dev/sdd. When the kernel attempts to 
>assemble /dev/md0,
>> > > it uses /dev/sda, /dev/sdb, and /dev/sdc and this fails 
>and /dev/md0
>> > > cannot be mounted and the kernel panics.
>> > >
>> > > Help, please. Suggestions? Thanks.
>> > >
>> > > BTW, I've put both SL4.8 Disc One, and the SL4.8 Live CD 
>on a bootable
>> > > USB stick; both boot successfully, but I was unable to 
>find a way to
>> > > make either one recognize /dev/md0, much less "rescue" me.
>> > >
>> > > Cheers,
>> > > Pann
>

This email communication and any files transmitted with it may contain 
confidential and or proprietary information and is provided for the use of the 
intended recipient only.  Any review, retransmission or dissemination of this 
information by anyone other than the intended recipient is prohibited.  If you 
receive this email in error, please contact the sender and delete this 
communication and any copies immediately.  Thank you.
http://www.encana.com


Re: Need help recovering SL4 /boot (not backed up)

2010-12-23 Thread Larry Linder
During the Boot process the files in /etc contain init files such as "fstab" 
file system table.   If a disk listed in this file is not available it drops 
you to run level 1.   You used to be able to modify the init data files used 
during boot.   Enter the root "passwd", modify files, save and reboot.   Now 
when you enter your root "passwd" you get two notices and it fails.  At this 
point not even the rescue stuff works.
To demo problem - load SL 5.5 on a disk and load your apps on a second disk.  
Once it is up and running shut it down, remove the data connection to the 
second disk.   On boot up it fails and drops you to a run level of 1.   At 
this point you are done.   This only happens on SL systems.  SUSE and others 
work fine.
With out the ability to become root you can't even use the rescue disk.
I tried it in the wee hours of a Sunday morning and when it failed I quit and 
reloaded the system.   Reloaded the system with no apps and after using the 
disk setup it was fine, bailed out after if fixed disk name, partitions.  
Crude but effective.   I immediately changed fstab to use hardware disk 
designations and not logicals.   In this system there are 7 disks, one for 
the OS, a separate for /usr and /usr/local and /opt.   The rest 
are /engr /acc /sales  etc.   The /home partition is not used and all "users" 
are on the other disks.   Or you could put /home on a sparate disk.

If the OS disk fails I can reload it from backup or fresh.
All disks are backed up at 12 AM and 12 PM.   An external backup 
called "spideroak" is used to back up users off site.  A lot of work but in a 
pinch, multiple failures and a bit of bad luck you may need it.   

In our case if we had a complete loss of all data - we would lay everyone off, 
turnout the the lights,  lock the door and quit.   We have been building on 
these systems for 20 + years. 

We love SL because it works and we have uptimes of 6 mo.   There are a few 
bitches from troops but it works and well.  A fine piece of work.

Thank You All
Merry Christmas
Larry Linder

My one wish for SL 6 is to have this problem fixed.

Larry Linder

On Wednesday 22 December 2010 5:10 pm, Kinzel, David wrote:
> >Advice: Buy a new big disk and use rsync to make a copy of disks.
> >run it every night using cron.
> >
> >Our insurance agent has us set him up with a back up on
> >separate computer.
> >Theives broke into his office and took everyting.  If we
> >hadn't put his old
> >disks on the self - he would be out of business.
> >
> >This is where it gets interesting.   If it is like SL5.6 you
> >are done because
> >you can't become root due to some problem in the code.   All
> >you get is a
> >couple of messages.   This was a problem I have had due to
> >disk failures in
> >the last year and on my wish list for 6.0.
>
> Can you explain what you mean by this?
>
> >I really think this needs to be fixed.
> >
> >Every Unix and Linux system I have used in the last 30 years
> >you were able to
> >become root in a system error.   Even System V.
> >
> >If you are in a corner you can fix almost any problem but you
> >need to be able
> >to set user to root.
> >
> >Good Luck
> >
> >Larry Linder
> >
> >On Wednesday 22 December 2010 2:42 pm, Pann McCuaig wrote:
> >> On Wed, Dec 22, 2010 at 13:00, Phong Nguyen wrote:
> >> > Have you tried determining what's in mdadm.conf in the
> >
> >initrd file? It
> >
> >> > might be getting some incorrect assembly instructions for md0.
> >>
> >> This may well be the issue. Other than /etc/modprobe.conf (which is
> >> obvious from the mkinitrd man page) and /etc/mdadm.conf,
> >
> >what does the
> >
> >> initrd file look at (or where can I find out)?
> >>
> >> > On 22 Dec 2010, at 1203, Pann McCuaig wrote:
> >> > > Greetings SL fans,
> >> > >
> >> > > Sorry for the length of this post, but I'm hoping
> >
> >someone can come to
> >
> >> > > my rescue and want to provide sufficient context.
> >> > >
> >> > > Recently I had a hard drive failure on a Sun X4600 box
> >
> >running SL4.8.
> >
> >> > > The box has four drives; the drive that failed was /boot
> >
> >(only). The
> >
> >> > > other three drives make up /dev/md0.
> >> > >
> >> > > The /boot drive was not backed up (headsmack!).
> >> > >
> >> > > I have created a rescue USB stick based on the System
> >
> >Rescue CD, which
> >
> >> > > boots via grub. I can boot the System Rescue CD
> >
> >successfully, and it
> >
> >> > > sees /dev/md0, which I can then mount and read from and write to.
> >> > >
> >> > > I have attempted to create a kernel and initrd image to
> >
> >add to the USB
> >
> >> > > stick that will boot the box as if the kernel and initrd
> >
> >image were on
> >
> >> > > the failed (and now removed) boot drive.
> >> > >
> >> > > I built the kernel and initrd image on a box similar to
> >
> >the box with
> >
> >> > > the failed hard drive. I replaced /etc/modprobe.conf
> >
> >with the file from
> >
> >> > > the target server, and then did 'yum install
> >
> >kernel-largesmp'.

Re: Need help recovering SL4 /boot (not backed up)

2010-12-24 Thread Jon Peatfield

On Thu, 23 Dec 2010, Larry Linder wrote:


During the Boot process the files in /etc contain init files such as "fstab"
file system table.   If a disk listed in this file is not available it drops
you to run level 1.   You used to be able to modify the init data files used
during boot.   Enter the root "passwd", modify files, save and reboot.   Now
when you enter your root "passwd" you get two notices and it fails.  At this
point not even the rescue stuff works.
To demo problem - load SL 5.5 on a disk and load your apps on a second disk.
Once it is up and running shut it down, remove the data connection to the
second disk.   On boot up it fails and drops you to a run level of 1.   At
this point you are done.   This only happens on SL systems.  SUSE and others
work fine.


I'm not sure if this is a troll.

It seems to work for me.  Disks die and we have had our share of them. 
When a non-boot disk dies the boot fails to mount the fs and we end up 
(after a prompt for the root pw) in a shell where we can fix fstab etc.


Note: apart from /boot/ we pretty much use LVM for all fs these days.

If recovery from a failed 'data disk' didn't work there would be many more 
people complaining.  Maybe your setup is unusual in some way.


Booting from the install media should allow access to all the file-systems 
as long as the kernel support is present.  Again it all seems to work as 
expected for me.


  -- Jon

--
/\
| "Computers are different from telephones.  Computers do not ring." |
|   -- A. Tanenbaum, "Computer Networks", p. 32  |
-|
| Jon Peatfield, _Computer_ Officer, DAMTP,  University of Cambridge |
| Mail:  jp...@damtp.cam.ac.uk Web:  http://www.damtp.cam.ac.uk/ |
\/


Re: Need help recovering SL4 /boot (not backed up)

2010-12-24 Thread Larry Linder
On Friday 24 December 2010 7:44 am, Jon Peatfield wrote:
> On Thu, 23 Dec 2010, Larry Linder wrote:
> > During the Boot process the files in /etc contain init files such as
> > "fstab" file system table.   If a disk listed in this file is not
> > available it drops you to run level 1.   You used to be able to modify
> > the init data files used during boot.   Enter the root "passwd", modify
> > files, save and reboot.   Now when you enter your root "passwd" you get
> > two notices and it fails.  At this point not even the rescue stuff works.
> > To demo problem - load SL 5.5 on a disk and load your apps on a second
> > disk. Once it is up and running shut it down, remove the data connection
> > to the second disk.   On boot up it fails and drops you to a run level of
> > 1.   At this point you are done.   This only happens on SL systems.  SUSE
> > and others work fine.
>
> I'm not sure if this is a troll.
>
> It seems to work for me.  Disks die and we have had our share of them.
> When a non-boot disk dies the boot fails to mount the fs and we end up
> (after a prompt for the root pw) in a shell where we can fix fstab etc.
That is the way it always had worked for the last 20 years.

> Note: apart from /boot/ we pretty much use LVM for all fs these days.
Plan to try it when new system is built and SL 6 is ready.

> If recovery from a failed 'data disk' didn't work there would be many more
> people complaining.  Maybe your setup is unusual in some way.
I can reproduce the problem in at least two systems running SL5.6
one a 32 bit system with a large number of SCSI disks  and a 64 bit MATX with 
4 SATA disks.  Same problem.

> Booting from the install media should allow access to all the file-systems
> as long as the kernel support is present.  Again it all seems to work as
> expected for me.
A new install always works.

>-- Jon
The problem with setting user to root must have crept in sometime after SL 4 
and SL 5.2.
All other systems in shop are SL 5.6 both 32 and 64 bit and they do the same 
thing.   Its not unique to hardware.


Re: Need help recovering SL4 /boot (not backed up)

2010-12-24 Thread Alan Bartlett
On 24 December 2010 16:09, Larry Linder  wrote:

> All other systems in shop are SL 5.6 both 32 and 64 bit and they do the same 
> thing.

"SL 5.6"?  From where did you get it?

Alan.