Linux MD problem

2009-07-07 Thread Mike Lovell
I have a machine that has 4 disks in a raid 10 using md. The machine 
went through an unclean shutdown yesterday and when the box came up, I 
saw errors like the following in the kernel log and the array no longer 
works.

[   28.575149] md: raid10 personality registered for level 10
[   28.610827] md: md0 stopped.
[   28.688678] md: bind
[   28.688981] md: bind
[   28.689269] md: bind
[   28.689566] md: bind
[   28.689748] md: kicking non-fresh sdw1 from array!
[   28.689890] md: unbind
[   28.690036] md: export_rdev(sdw1)
[   28.690175] md: kicking non-fresh sdv1 from array!
[   28.690316] md: unbind
[   28.690452] md: export_rdev(sdv1)
[   28.690589] md: kicking non-fresh sdu1 from array!
[   28.690730] md: unbind
[   28.690866] md: export_rdev(sdu1)
[   28.704706] raid10: not enough operational mirrors for md0
[   28.704706] md: pers->run() failed ...

Anyone have some suggestions on what can be done to get the array 
working again? There are a lot of virtual machines that are on the array 
that I really don't want to have to go through and rebuild. Thanks in 
advance for any advice you guys have.

mike

/*
PLUG: http://plug.org, #utah on irc.freenode.net
Unsubscribe: http://plug.org/mailman/options/plug
Don't fear the penguin.
*/


new hardware exchange type group

2009-07-07 Thread Kyle Waters
I've seen people giving a way a lot of still useful hardware in this 
group and thought you would want to know about this new effort by Pete 
Ashdown:

http://peteashdown.org/journal/2009/07/07/the-electroregeneration-society/

Kyle

/*
PLUG: http://plug.org, #utah on irc.freenode.net
Unsubscribe: http://plug.org/mailman/options/plug
Don't fear the penguin.
*/


Re: new hardware exchange type group

2009-07-07 Thread Kyle Waters
On 07/07/2009 01:08 PM, Kyle Waters wrote:
> I've seen people giving a way a lot of still useful hardware in this
> group and thought you would want to know about this new effort by Pete
> Ashdown:
>
> http://peteashdown.org/journal/2009/07/07/the-electroregeneration-society/
>
> Kyle
>
>


So I skimmed over the most important part of the post.  These computers 
are to be primarily given to non-profits and the disabled.

Kyle

/*
PLUG: http://plug.org, #utah on irc.freenode.net
Unsubscribe: http://plug.org/mailman/options/plug
Don't fear the penguin.
*/


Re: Linux MD problem

2009-07-07 Thread Nicholas Leippe
I can't help, but am really interested to know your solution when you get it.

I have avoided the raid10 personality since I could never confirm the actual 
layout to know that it really was mirroring. When I last tried it, all the 
outputs it logged including when failing and adding devices implied it was 
only striping--I was never convinced it worked--which was unacceptable for a 
ha environment. Perhaps they could improve the usability of mdadm.

Instead, I have always done my own raid10 via layering raid0 on top of raid1 
mirrors. This way I can make the physical topology anything I want, such as 
mirroring or striping across different busses. It makes it possible to boot 
from as well.



/*
PLUG: http://plug.org, #utah on irc.freenode.net
Unsubscribe: http://plug.org/mailman/options/plug
Don't fear the penguin.
*/


Re: Linux MD problem

2009-07-07 Thread Kenneth Burgener
On 7/7/2009 1:03 PM, Mike Lovell wrote:
> I have a machine that has 4 disks in a raid 10 using md.
>
> [   28.575149] md: raid10 personality registered for level 10
> [   28.610827] md: md0 stopped.
> [   28.688678] md: bind
> [   28.688981] md: bind
> [   28.689269] md: bind
> [   28.689566] md: bind

Are you able to boot into the OS?  What does 'cat /proc/mdstat' show?  
What does 'mdadm --examine /dev/sdu1' (or sdv,sdw,sdx) show?  Normally 
if only one disk has failed, the array should be able to activate, but 
in a degraded state.  For some reason your system thinks that sdu, sdv, 
sdw are all in an invalid state, which means there are not enough 
devices to reassemble the array.  I haven't seen the "non-fresh" error 
before.  This could simply mean it avoided assembling the array due to 
some sort of minor out of date, or out of sequence issue.  As a last 
resort you could try to forcefully reassemble the array (no guarantees):

mdadm --examine /dev/sdu1 | grep -i uuid
# copy and paste the uuid into the following
mdadm --assemble /dev/md0 --force --uuid=[UUID_from_previous_command]



Kenneth


/*
PLUG: http://plug.org, #utah on irc.freenode.net
Unsubscribe: http://plug.org/mailman/options/plug
Don't fear the penguin.
*/


Re: Linux MD problem

2009-07-07 Thread Shane Hathaway
Mike Lovell wrote:
> I have a machine that has 4 disks in a raid 10 using md. The machine 
> went through an unclean shutdown yesterday and when the box came up, I 
> saw errors like the following in the kernel log and the array no longer 
> works.
> 
> [   28.575149] md: raid10 personality registered for level 10
> [   28.610827] md: md0 stopped.
> [   28.688678] md: bind
> [   28.688981] md: bind
> [   28.689269] md: bind
> [   28.689566] md: bind
> [   28.689748] md: kicking non-fresh sdw1 from array!
> [   28.689890] md: unbind
> [   28.690036] md: export_rdev(sdw1)
> [   28.690175] md: kicking non-fresh sdv1 from array!
> [   28.690316] md: unbind
> [   28.690452] md: export_rdev(sdv1)
> [   28.690589] md: kicking non-fresh sdu1 from array!
> [   28.690730] md: unbind
> [   28.690866] md: export_rdev(sdu1)
> [   28.704706] raid10: not enough operational mirrors for md0
> [   28.704706] md: pers->run() failed ...
> 
> Anyone have some suggestions on what can be done to get the array 
> working again? There are a lot of virtual machines that are on the array 
> that I really don't want to have to go through and rebuild. Thanks in 
> advance for any advice you guys have.

I would try assembling the array without sdx1:

   mdadm --assemble /dev/md0 /dev/sd[uvw]1 missing

If that works, I would check the filesystem for errors *without 
repairing anything yet*, since I don't want to change any bits at all yet:

   e2fsck -n -f /dev/md0

If that shows no errors, or a small number of easily repaired errors, 
then I would try to re-add sdx1:

   mdadm --manage --re-add /dev/md0 /dev/sdx1

If that second command fails, I would use "--add" instead of "--re-add", 
causing sdx1 to be rebuilt from scratch.

Then I would perform the real filesystem repair:

   e2fsck -p -f /dev/md0

If the first command I gave doesn't work, I would use "dd" to copy the 
partitions to backup drives before doing anything else, then I would do 
more aggressive things like "mdadm --assemble --force --update resync".

Shane

/*
PLUG: http://plug.org, #utah on irc.freenode.net
Unsubscribe: http://plug.org/mailman/options/plug
Don't fear the penguin.
*/