Re: Grub2 reinstall on raid1 system.

Bob Proulx Sat, 15 Jan 2011 15:58:10 -0800

Jack Schneider wrote:
> Bob Proulx wrote:
> > Jack Schneider wrote:
> > > I have a raid1 based W/S running Debian Squeeze uptodate. (was
> > > until ~7 days ago) There are 4 drives, 2 of which had never been
> > > used or formatted. I configured a new array using Disk Utility from
> > > a live Ubuntu CD. That's where I screwed up... The end result was
> > > the names of the arrays were changed on the working 2 drives.
> > > IE: /dev/md0 to /dev/126 and /dev/md1 became md127.
> > 
> > Something else must have happened too.  Because normally just adding
> > arrays will not rename the existing arrays.  I am not familiar with
> > the "Disk Utility" that you mention.
>
> Hi, Bob 
> Thanks for your encouraging advice...


I believe you should be able to completely recover from the current
problems.  But it may be tedious and not completely trivial.  You will
just have to work through it.

Now that there is more information available, and knowing that you are
using software raid and lvm, let me guess.  You added another physical
extent (a new /dev/md2 partition) to the root volume group?  If so
that is a common problem.  I have hit it myself on a number of
occasions.  You need to update the mdadm.conf file and rebuild the
initrd.  I will say more details about it as I go here in this message.

> As I mentioned in a prior post,Grub was leaving me at a Grub rescue>prompt.  
> 
> I followed this procedure:
> http://www.gnu.org/software/grub/manual/html_node/GRUB-only-offers-a-rescue-shell.html#GRUB-only-offers-a-rescue-shell

That seems reasonable.  It talks about how to drive the grub boot
prompt to manually set up the boot.

But you were talking about using a disk utility from a live cd to
configure a new array with two new drives and that is where I was
thinking that you had been modifying the arrays.  It sounded like it
anyway.

Gosh it would be a lot easier if we could just pop in for a quick peek
at the system in person.  But we will just have to make do with the
correspondence course.  :-)

> Booting now leaves me at a busy box: However the Grub menu is correct.
> With the correct kernels. So it appears that grub is now finding the
> root/boot partitions and files. 

That sounds good.  Hopefully not too bad off then.

> > Next time instead you might just use mdadm directly.  It really is
> > quite easy to create new arrays using it.  Here is an example that
> > will create a new device /dev/md9 mirrored from two other devices
> > /dev/sdy5 and /dev/sdz5.
> > 
> >   mdadm --create /dev/md9 --level=mirror
> > --raid-devices=2 /dev/sdy5 /dev/sdz5
> > 
> > > Strangely the md2 array which I setup on the added drives remains as
> > > /dev/md2. My root partition is/was on /dev/md0. The result is that
> > > Grub2 fails to boot the / array.

> This is how I created /dev/md2.

Then that explains why it didn't change.  Probably the HOMEHOST
parameter is involved on the ones that changed.  Using mdadm from the
command line doesn't set that parameter.

There was just a long discussion about this topic just recently.
You might want to jump into it in the middle here and read our
learnings with HOMEHOST.

  http://lists.debian.org/debian-user/2010/12/msg01105.html

> mdadm --examine /dev/sda1 & /dev/sda2  gives I think a clean result 
> I have posted the output at : http://pastebin.com/pHpKjgK3

That looks good to me.  And healthy and normal.  Looks good to me for
that part.

But that is only the first partition.  That is just /dev/md0.  Do you
have any information on the other partitions?

You can look at /proc/partitions to get a list of all of the
partitions that the kernel knows about.

  cat /proc/partitions

Then you can poke at the other ones too.  But it looks like the
filesystems are there okay.

> mdadm --detail /dev/md0 --> gives  mdadm: md device /dev/md0 does not
> appear to be active. 
> 
> There is no /proc/mdstat  data output.  

So it looks like the raid data is there on the disks but that the
multidevice (md) module is not starting up in the kernel.  Because it
isn't starting then there aren't any /dev/md* devices and no status
output in /proc/mdstat.

> > I would boot a rescue image and then inspect the current configuration
> > using the above commands.  Hopefully that will show something wrong
> > that can be fixed after you know what it is.

I still think this is the best course of action for you.  Boot a
rescue disk into the system and then go from there.  Do you have a
Debian install disk #1 or Debian netinst or other installation disk?
Any of those will have a rescue system that should boot your system
okay.  The Debian rescue disk will automatically search for raid
partitions and automatically start the md modules.

> So it appears that I must rebuild my arrays.

I think your arrays might be fine.  More information is needed.

You said your boot partition was /dev/md0.  I assume that your root
partition was /dev/md1?  Then you added two new disks as /dev/md2?

  /dev/md0   /dev/sda1  /dev/sdc1

Let me guess at the next two:

  /dev/md1   /dev/sda2  /dev/sdc2  <-- ?? missing info ??
  /dev/md2   /dev/sdb1  /dev/sdd1  <-- ?? missing info ??

Are those even close to being correct?

> I think I can munge thru the mdadm man pages or Debian Reference to
> get the tasks.

If you only have the Ubuntu live cd system then you can boot it up
from the cdrom and then use it to get the arrays started.  I still
think using a rescue disk is better.  But with the live cd you
mentioned using before you can also boot and do system repair.  You
will probably need to manually load the md and dm modules.

  $ sudo modprobe md_mod
  $ sudo modprobe dm_mod

And then after those have been loaded that should create a
/proc/mdstat interface from the kernel.  It is only present after the
driver is loaded.

  $ cat /proc/mdstat

I am hoping that it will show some good information about the state of
things at that point.  Since you were able to post the other output
from mdadm --examine I am hoping that you will be able to post this
too.

To manually start an array you would assemble the existing components.

  mdadm  --assemble /dev/md0   /dev/sda1 /dev/sdc1

And if we knew the disk partitions of the other arrays then we could
assemble them too.  Hopefully.  I have my fingers crossed for you.
But something similar to the above but with /dev/md1 and the two
partitions for that array.

Or if the array is claiming that it is failed (if that has happened)
then you could --add the partitions back in.

  mdadm  --assemble /dev/md1   --add /dev/sdc2  ## for example only

After being added back in the array will be sync'ing between them.
This data sync can be monitored by looking at /proc/mdstat.

  cat /proc/mdstat

If you get to that point and things are sync'ing then I would tend to
let that finish before doing anything else.

Note that there is a file /etc/mdadm/mdadm.conf that contains the
UUIDs (and in previous releases also the /dev/sda1 type devices) of
the arrays.  If things have changed then that file will need to be
updated.

That file /etc/mdadm/mdadm.conf is used to build the mdadm.conf file
in the boot initrd (initial ramdisk).  If it changes for the root
volume for any volume used by lvm then the initrd will need to be
rebuilt in order to update that file.

Let me say this again because it is important.  If you add a partition
to the root lvm volume group then you must rebuild the initrd or your
system will not boot.  I have been there and done that myself on more
than one occasion.

  dpkg-reconfigure linux-image-2.6.32-5-i686   # choose kernel package

The easiest way to do this is to boot a Debian rescue cd and boot into
the system, update the /etc/mdadm/mdadm.conf file and then issue the
above dpkg-reconfigure command.

You can get the information for the /etc/mdadm/mdadm.conf file by
using the mdadm command to scan for it.

  $ sudo mdadm --detail --scan

Use that information to update the /etc/mdadm/mdadm.conf file.

> Thanks for the help..  You need all the help you can get at 76yrs..

I am still a few decades behind you but we are all heading that same
direction.  :-) Patience, persistence and tenacity will get you
through it.  You are doing well.  Just keep working the problem.

Bob

signature.asc
Description: Digital signature

Re: Grub2 reinstall on raid1 system.

Reply via email to