Re: [CentOS] DegradedArray message

2014-12-09 Thread Gordon Messmer

On 12/08/2014 08:35 PM, David McGuffey wrote:

I'll still get ready for another failure. Will read up on the best
methods to have an encrypted filesystem on top of raid-1.


I'm pretty sure that if you tell the Fedora installer to build an 
encrypted RAID1 system, you'll get exactly what I described previously.  
In detail:


sda1 - 512MB
sda2 - remainder of disk

sdb1 - 512MB
sdb2 - remainder of disk

md0 - RAID1 including sda1 and sdb1
md1 - RAID1 including sda2 and sdb2

/boot - filesystem on md0

luks-$(uuid) - encrypted block device on md1

pv.01 - LVM2 physical volume on luks-$(uuid)

fedora_$(hostname) - LVM2 volume group including "pv.01"

swap - swap on logical volume
root - filesystem on logical volume
home - filesystem on logical volume

If you replace a disk, you'll need to partition it correctly and "mdadm 
--add" it to the two RAID volumes.


If you manually set up some other layering, replacing a disk will 
probably be more involved.

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] DegradedArray message

2014-12-08 Thread David McGuffey
On Mon, 2014-12-08 at 21:11 -0500, David McGuffey wrote:
> On Thu, 2014-12-04 at 16:46 -0800, Gordon Messmer wrote:
> > On 12/04/2014 05:45 AM, David McGuffey wrote:
> 
> > In practice, however, there's a bunch of information you didn't provide, 
> > so some of those steps are wrong.
> > 
> > I'm not sure what dm-0, dm-2 and dm-3 are, but they're indicated in your 
> > mdstat.  I'm guessing that you made partitions, and then made LVM or 
> > crypto devices, and then did RAID on top of that.  If either of those 
> > are correct, that's completely the wrong way to build RAID sets.  You 
> > risk either bad performance from doing crypto more often than is 
> > required, or possibly corruption as a result of LVM not mapping blocks 
> > the way you expect.
> > 
> > If you build software RAID, I really strongly recommend that you keep it 
> > as simple as possible.  That means a) build sofware RAID sets from raw 
> > partitions and b) use as few partitions as possible.
> > 
> 
> Gordon,
> 
> Agree, I've probably made it too complicated. It is a workstation with
> sensitive data on it so I've encrypted the partitions.
> 
> md1 is fairly simple...two large disks in raid1, encrypted, and mounted
> as /home.
> 
> md0 is probably way too complicated and not a good way to go.  The
> sensitive data in md0 is in /var (virtual machines).
> 
> I've backed up both /home and /var/lib/libvirt/images, so I think I'll
> start over on md0 with a new disk and a fresh install.
> 
> Dave
> 
Armed with a backup I decided to use the disk utility GUI to check the
array and then re-attach the disk. After a rebuild phase it reattached
and the state changed to 'clean.' I rebooted to see if it would stay
attached; it did.

I'll still get ready for another failure. Will read up on the best
methods to have an encrypted filesystem on top of raid-1.

Dave M


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] DegradedArray message

2014-12-08 Thread David McGuffey
On Thu, 2014-12-04 at 16:46 -0800, Gordon Messmer wrote:
> On 12/04/2014 05:45 AM, David McGuffey wrote:

> In practice, however, there's a bunch of information you didn't provide, 
> so some of those steps are wrong.
> 
> I'm not sure what dm-0, dm-2 and dm-3 are, but they're indicated in your 
> mdstat.  I'm guessing that you made partitions, and then made LVM or 
> crypto devices, and then did RAID on top of that.  If either of those 
> are correct, that's completely the wrong way to build RAID sets.  You 
> risk either bad performance from doing crypto more often than is 
> required, or possibly corruption as a result of LVM not mapping blocks 
> the way you expect.
> 
> If you build software RAID, I really strongly recommend that you keep it 
> as simple as possible.  That means a) build sofware RAID sets from raw 
> partitions and b) use as few partitions as possible.
> 

Gordon,

Agree, I've probably made it too complicated. It is a workstation with
sensitive data on it so I've encrypted the partitions.

md1 is fairly simple...two large disks in raid1, encrypted, and mounted
as /home.

md0 is probably way too complicated and not a good way to go.  The
sensitive data in md0 is in /var (virtual machines).

I've backed up both /home and /var/lib/libvirt/images, so I think I'll
start over on md0 with a new disk and a fresh install.

Dave

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] DegradedArray message

2014-12-04 Thread Gordon Messmer

On 12/04/2014 05:45 AM, David McGuffey wrote:

md0 is made up of two 250G disks on which the OS and a very large /var
partions resides for a number of virtual machines.

...

Challenge is that disk 0 of md0 is the problem and it has a 524M /boot
partition outside of the raid partition.


Assuming that you have an unused drive port, you can fix that pretty easily.

Attach a new replacement disk to the unused port.  Let's say that it 
comes up as /dev/sde.


Copy the partition table to it (unless it's GPT, in which case use parted):
sfdisk -d /dev/sda | sfdisk /dev/sde

Unmount /boot and copy that partition (assuming that it is sda1):
umount /boot
dd if=/dev/sda1 of=/dev/sde1 bs=1M

Install grub on the new drive:
grub-install /dev/sde

At that point, you should be able to also add the new partition to the 
md array:

mdadm /dev/md0 /dev/sda2

Once it rebuilds, shut down.  Remove the bad drive.  Put the new drive 
in its place.  In theory the system will boot and be whole.


In practice, however, there's a bunch of information you didn't provide, 
so some of those steps are wrong.


I'm not sure what dm-0, dm-2 and dm-3 are, but they're indicated in your 
mdstat.  I'm guessing that you made partitions, and then made LVM or 
crypto devices, and then did RAID on top of that.  If either of those 
are correct, that's completely the wrong way to build RAID sets.  You 
risk either bad performance from doing crypto more often than is 
required, or possibly corruption as a result of LVM not mapping blocks 
the way you expect.


If you build software RAID, I really strongly recommend that you keep it 
as simple as possible.  That means a) build sofware RAID sets from raw 
partitions and b) use as few partitions as possible.


Typically, I'll create two partitions on all disks.  The first is a 
small partition for /boot, which may be part of a RAID1 set or may be 
unused.  The second partition covers the rest of the drive and will be 
used in whatever arrangement is suitable for that system, whether it's 
RAID1, RAID5, or RAID10.  All of the drives are consistent, so there's 
always a place to copy /boot, and just one script or process to set up 
new disks regardless of their position in the array.  md0 is used for 
/boot, and md1 is an LVM PV.  All of the filesystems other than /boot 
are LVs.


Hopefully btrfs will become the default fs in the near future and all of 
this will be vastly simplified.


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] DegradedArray message

2014-12-04 Thread David McGuffey
Thanks for all the responses.  A little more digging revealed:

md0 is made up of two 250G disks on which the OS and a very large /var
partions resides for a number of virtual machines.

md1 is made up of two 2T disks on which /home resides.

Challenge is that disk 0 of md0 is the problem and it has a 524M /boot
partition outside of the raid partition.

My plan is to back up /home (md1) and at a minimum /etc/libvirt
and /var/lib/libvirt (md0) before I do anything else.

Here are the log entries for 'raid'

Dec  1 20:50:15 desk4 kernel: md/raid1:md1: not clean -- starting
background reconstruction
Dec  1 20:50:15 desk4 kernel: md/raid1:md1: active with 2 out of 2
mirrors
Dec  1 20:50:15 desk4 kernel: md/raid1:md0: active with 1 out of 2
mirrors

This is a desktop, not a server. We've had several short (<20 sec) power
outages over the last month. The last one was on 1 Dec. I suspect the
sudden loss and restoration of power could have trashed a portion of
disk 0 in md0.

I finally obtained an APC UPS (BX1500G), installed, configured, and
tested it. In the future, it will carry me through these short outages.

I'll obtain a new 250G (or larger) drive and start rooting around for
guidance on how to replace a drive with the MBR and /boot on it.

On Wed, 2014-12-03 at 22:11 +0100, Leon Fauster wrote:
> Hi David,
> 
> Am 03.12.2014 um 02:14 schrieb David McGuffey :
> > This is an automatically generated mail message from mdadm
> > running on desk4
> > 
> > A DegradedArray event had been detected on md device /dev/md0.
> > 
> > Faithfully yours, etc.
> > 
> > P.S. The /proc/mdstat file currently contains the following:
> > 
> > Personalities : [raid1] 
> > md0 : active raid1 dm-2[1]
> >  243682172 blocks super 1.1 [2/1] [_U]
> >  bitmap: 2/2 pages [8KB], 65536KB chunk
> > 
> > md1 : active raid1 dm-3[0] dm-0[1]
> >  1953510268 blocks super 1.1 [2/2] [UU]
> >  bitmap: 3/15 pages [12KB], 65536KB chunk
> 
> 
> the reason why one drive was kicked out (above [_U] ) will 
> be in /var/log/messages. If it is also part of md1 then 
> it should be manually removed from md1 before replacing the 
> hd. 
> 
> --
> LF
> 
> 
> 
> 
> 
> ___
> CentOS mailing list
> CentOS@centos.org
> http://lists.centos.org/mailman/listinfo/centos


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] DegradedArray message

2014-12-03 Thread Leon Fauster
Hi David,

Am 03.12.2014 um 02:14 schrieb David McGuffey :
> This is an automatically generated mail message from mdadm
> running on desk4
> 
> A DegradedArray event had been detected on md device /dev/md0.
> 
> Faithfully yours, etc.
> 
> P.S. The /proc/mdstat file currently contains the following:
> 
> Personalities : [raid1] 
> md0 : active raid1 dm-2[1]
>  243682172 blocks super 1.1 [2/1] [_U]
>  bitmap: 2/2 pages [8KB], 65536KB chunk
> 
> md1 : active raid1 dm-3[0] dm-0[1]
>  1953510268 blocks super 1.1 [2/2] [UU]
>  bitmap: 3/15 pages [12KB], 65536KB chunk


the reason why one drive was kicked out (above [_U] ) will 
be in /var/log/messages. If it is also part of md1 then 
it should be manually removed from md1 before replacing the 
hd. 

--
LF





___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] DegradedArray message

2014-12-03 Thread SilverTip257
On Wed, Dec 3, 2014 at 1:44 AM, Mogens Kjaer  wrote:

> On 12/03/2014 03:24 AM, Fred Smith wrote:
>
>> OTOH, I had a perfectly good drive get kicked out of my RAID-1
>> array a fewyears ago just because, well, I guess I could say
>> "it felt like it".
>>
>
> I've seen that too several times on my home "server".
>

I have also seen drives/partitions get kicked of of softraid arrays for
unknown reasons.
My first choice is to re-add the member into the softraid and see if it 1)
will re-add and 2) if it stays a member

As far as re-adding drives to a softraid array, I've wrote on this list a
few times before.  Plus the URLs others have provided will help you as
well.  Any questions, just ask and somebody will reply.

http://lists.centos.org/pipermail/centos/2013-November/138699.html
http://lists.centos.org/pipermail/centos/2014-February/140700.html

Good luck.
Digimer's advice about testing your scenario out on a VM is perfect one
since you're new to the task.


-- 
---~~.~~---
Mike
//  SilverTip257  //
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] DegradedArray message

2014-12-02 Thread Mogens Kjaer

On 12/03/2014 03:24 AM, Fred Smith wrote:

OTOH, I had a perfectly good drive get kicked out of my RAID-1
array a fewyears ago just because, well, I guess I could say
"it felt like it".


I've seen that too several times on my home "server".

Once in a while (usually on one of the first days I'm on vacation) one
of the drives stops responding.

A shutdown and cold restart is necessary to bring the drive alive again,
just a reboot won't fix it.

After this, I rebuild the RAID partitions and all is OK.

smartctl shows no sign of problems with the drive, so I suspect a
controller problem.

This is on a desktop machine used as a server.

I guess this explains why we have server grade hardware :-)

Mogens

--
Mogens Kjaer, m...@lemo.dk
http://www.lemo.dk
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] DegradedArray message

2014-12-02 Thread John R Pierce

On 12/2/2014 6:24 PM, Fred Smith wrote:

In reality, I had (in my ignorance) purchased a pair of WD
drives that aren't intended to be used in a RAID array, and
once in a long while (that was actually the only such instance
in the 4-5 years I've had that RAID array) it doesn't respond to
some HD command or other and gets dropped.


desktop class SATA drives will report 'write successful' when there's 
still data in its buffers, so the raid will happily continue, then if 
the drive actually gets a unrecoverable write error, things are toast, 
the raid is out of sync.


this is a major reason I'm leaning towards using ZFS for future raids 
(primarily via using FreeBSD), because ZFS checksums and timestamps 
every block it writes, so it can look at the two blocks that a regular 
raid can only say "something is wrong here, but what it is I ain't 
exactly sure" and go "A is good, B is bad/stale, lets replicate A back 
to B", as part of the zpool 'scrub' process.



--
john r pierce  37N 122W
somewhere on the middle of the left coast

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] DegradedArray message

2014-12-02 Thread Keith Keller
On 2014-12-03, David McGuffey  wrote:
>
> Appears to me that device 0 (/dev/dm-2) on md0 has been removed because
> of problems.

That looks about right.  There may be more error messages in your system
logs (e.g., /var/log/messages, dmesg), which might tell you more about
the nature of the failure.

> This is my first encounter with a raid failure. I suspect I should
> replace disk 0 and let the raid rebuild itself.
>
> Seeking guidance and a good source for the procedures.

The linux RAID wiki is often a good (though sometimes dated) resource.

https://raid.wiki.kernel.org/index.php/Reconstruction

If you wish to attempt a hot swap, make *sure* you pull the correct
device!  If you're not sure, or not sure your system even supports it,
it's safer to power down to do the swap.  You should verify which drive
has failed before shutting down, though this will be less catastrophic
if you pick the wrong one.

As long as you are careful, reconstructing a degraded RAID is usually
pretty straightforward.

--keith

-- 
kkel...@wombat.san-francisco.ca.us


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] DegradedArray message

2014-12-02 Thread Fred Smith
On Tue, Dec 02, 2014 at 08:14:19PM -0500, David McGuffey wrote:
> Received the following message in mail to root:
> 
> Message 257:
> >From root@desk4.localdomain  Tue Oct 28 07:25:37 2014
> Return-Path: 
> X-Original-To: root
> Delivered-To: root@desk4.localdomain
> From: mdadm monitoring 
> To: root@desk4.localdomain
> Subject: DegradedArray event on /dev/md0:desk4
> Date: Tue, 28 Oct 2014 07:25:27 -0400 (EDT)
> Status: RO
> 
> This is an automatically generated mail message from mdadm
> running on desk4
> 
> A DegradedArray event had been detected on md device /dev/md0.
> 
> Faithfully yours, etc.
> 
> P.S. The /proc/mdstat file currently contains the following:
> 
> Personalities : [raid1] 
> md0 : active raid1 dm-2[1]
>   243682172 blocks super 1.1 [2/1] [_U]
>   bitmap: 2/2 pages [8KB], 65536KB chunk
> 
> md1 : active raid1 dm-3[0] dm-0[1]
>   1953510268 blocks super 1.1 [2/2] [UU]
>   bitmap: 3/15 pages [12KB], 65536KB chunk
> 
> unused devices: 

Could be a bad drive, as digimer alludes in his reply.

OTOH, I had a perfectly good drive get kicked out of my RAID-1
array a fewyears ago just because, well, I guess I could say
"it felt like it".

In reality, I had (in my ignorance) purchased a pair of WD
drives that aren't intended to be used in a RAID array, and
once in a long while (that was actually the only such instance
in the 4-5 years I've had that RAID array) it doesn't respond to
some HD command or other and gets dropped.

turned out to be easy to reinsert it and it ran for a long
time thereafter without trouble.

I can dig for the info on the drives and the nature of the
problem if anyone wants to see it.

Fred



-- 
 Fred Smith -- fre...@fcshome.stoneham.ma.us -
The Lord is like a strong tower. 
 Those who do what is right can run to him for safety.
--- Proverbs 18:10 (niv) -
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] DegradedArray message

2014-12-02 Thread Digimer

On 02/12/14 08:14 PM, David McGuffey wrote:

Received the following message in mail to root:

Message 257:
 From root@desk4.localdomain  Tue Oct 28 07:25:37 2014
Return-Path: 
X-Original-To: root
Delivered-To: root@desk4.localdomain
From: mdadm monitoring 
To: root@desk4.localdomain
Subject: DegradedArray event on /dev/md0:desk4
Date: Tue, 28 Oct 2014 07:25:27 -0400 (EDT)
Status: RO

This is an automatically generated mail message from mdadm
running on desk4

A DegradedArray event had been detected on md device /dev/md0.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid1]
md0 : active raid1 dm-2[1]
   243682172 blocks super 1.1 [2/1] [_U]
   bitmap: 2/2 pages [8KB], 65536KB chunk

md1 : active raid1 dm-3[0] dm-0[1]
   1953510268 blocks super 1.1 [2/2] [UU]
   bitmap: 3/15 pages [12KB], 65536KB chunk

unused devices: 

& q
Held 314 messages in /var/spool/mail/root
You have mail in /var/spool/mail/root

Ran a madam query against both raid partitions:

[root@desk4 ~]# mdadm --query --detail /dev/md0
/dev/md0:
 Version : 1.1
   Creation Time : Thu Nov 15 19:24:17 2012
  Raid Level : raid1
  Array Size : 243682172 (232.39 GiB 249.53 GB)
   Used Dev Size : 243682172 (232.39 GiB 249.53 GB)
Raid Devices : 2
   Total Devices : 1
 Persistence : Superblock is persistent

   Intent Bitmap : Internal

 Update Time : Tue Dec  2 20:02:55 2014
   State : active, degraded
  Active Devices : 1
Working Devices : 1
  Failed Devices : 0
   Spare Devices : 0

Name : desk4.localdomain:0
UUID : 29f70093:ae78cf9f:0ab7c1cd:e380f50b
  Events : 266241

 Number   Major   Minor   RaidDevice State
0   000  removed
1 25331  active sync   /dev/dm-3

[root@desk4 ~]# [root@desk4 ~]# mdadm --query --detail /dev/md1
/dev/md1:
 Version : 1.1
   Creation Time : Thu Nov 15 19:24:19 2012
  Raid Level : raid1
  Array Size : 1953510268 (1863.01 GiB 2000.39 GB)
   Used Dev Size : 1953510268 (1863.01 GiB 2000.39 GB)
Raid Devices : 2
   Total Devices : 2
 Persistence : Superblock is persistent

   Intent Bitmap : Internal

 Update Time : Tue Dec  2 20:06:21 2014
   State : active
  Active Devices : 2
Working Devices : 2
  Failed Devices : 0
   Spare Devices : 0

Name : desk4.localdomain:1
UUID : 1bef270d:36301a24:7b93c7a9:a2a95879
  Events : 108306

 Number   Major   Minor   RaidDevice State
0 25300  active sync   /dev/dm-0
1 25311  active sync   /dev/dm-1
[root@desk4 ~]#

Appears to me that device 0 (/dev/dm-2) on md0 has been removed because
of problems.

This is my first encounter with a raid failure. I suspect I should
replace disk 0 and let the raid rebuild itself.

Seeking guidance and a good source for the procedures.

Dave M


In short, buy a replacement disk equal or greater size, create matching 
partitions and then use mdadm to add the replacement partition (of 
appropriate size) back into the array.


An example command to add a replacement partition would be:

mdadm --manage /dev/md0 --add /dev/sda1

I strongly recommend creating a virtual machine with a pair of virtual 
disks and simulating the replacement of the drive before trying it out 
on your real system. In any case, be sure to have good backups 
(immediately).


--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos