raid10 messed up by false multipath setup (was: raid10 messed up filesystem, lvm lv ok)

2008-01-21 Thread Ask Bjørn Hansen


On Jan 19, 2008, at 3:44 AM, Ask Bjørn Hansen wrote:

Replying to myself with an update, mostly for the sake of the archives  
(I went through the linux-raid mail from the last year yesterday while  
waiting for my raw-partition backups to finish).


I mentioned[1] my trouble with the multipath detection code on the  
Fedora rescue mode messing up my raid yesterday.

[...]
I suspect that maybe the layout of the md device got messed up?  How  
can I find out if that's the case?  Would it be possible to recover  
from (assuming all the data still is on some of the disks).


I realized that of the 11 disks (9 in the raid, 2 spares) one of the  
disks affected by the fake multipath mishap was a spare, so after  
backing up all the raw partitions[2] I re-created the raid in place  
with the other affected disk marked missing and it seems like the file  
system is more or less okay.   Yes, I'm doing a backup now.  :-)


Lessons:

  1) Do backups of your raid'ed data.   Yes, it can be a pain but  
figure it out.


  2) Keep your root partition on a simple raid1 (or on a lvm group  
that's on a simple raid1).


  3) When the raid goes @#$ - don't panic, make sure nothing is being  
written to the disks and stop.   (Some years ago I lost a raid5 to the  
oops, had a read-error, drop the last disk issue and I suspect I  
could have saved it had I been patient and stopped working on it until  
I was more awake).


  4) Have/make copies of the mdadm -D / -E output.

  5) If you care about the data, do a backup of your raw partitions  
before trying to restore.


  6) The create the raid on top of the old raid trick saves the day  
again (for a while I had some kind of cabling problem on a box with a  
raid6 - I lost track of how many times I did the recreate thing).


Secondary question: I'm doing a dd if=/dev/sdX5 bs=256k  /backup/ 
sdX5 for each disk -- is there a way to run mdadm on the copies and  
experiment on those?(It took ~forever to copy a terabyte of the  
raw partitions).


(For the archives) - I didn't try it, but setting up the disk images  
as loop devices should work.  I didn't think of that yesterday.



 - ask



[1] http://marc.info/?l=linux-raidm=120065542429935w=2



[2] And oh man am I glad I backed them up.  On my first attempt at  
recreating the raid I forgot the md device parameter and --assume- 
clean, so it created a raid device on one of my source partitions and  
immediately started syncing at ~120MB/sec.  Restoring the partitions  
from the backup worked fine fortunately.


--
http://develooper.com/ - http://askask.com/


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mdadm error when trying to replace a failed drive in RAID5 array

2008-01-21 Thread Ask Bjørn Hansen


On Jan 20, 2008, at 1:21 PM, Steve Fairbairn wrote:

So the device I was trying to add was about 22 blocks too small.   
Taking

Neils suggestion and looking at /proc/partitions showed this up
incredibly quickly.


Always leave a little space in the end; it makes sure you don't run  
into that particular problem when you replace disks and the end of the  
disk is often significantly slower anyway.


From before the write-intent bitmap stuff I have/had a habit of  
creating separate raids on relatively small partitions (joined  
together by LVM).  I'd just pick a fixed size (on 500GB disks I'd use  
90GB per partition for example) and create however many partitions  
would fit like that and leave the end for scratch space /  
experiments / whatever.



 - ask

--
http://develooper.com/ - http://askask.com/


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: One Large md or Many Smaller md for Better Peformance?

2008-01-21 Thread Ask Bjørn Hansen


On Jan 20, 2008, at 2:18 PM, Bill Davidsen wrote:

One partitionable RAID-10, perhaps, then partition as needed. Read  
the discussion here about performance of LVM and RAID. I personally  
don't do LVM unless I know I will have to have great flexibility of  
configuration and can give up performance to get it. Other report  
different results, so make up your own mind.



On MySQL servers I always always use LVM, even if performance is  
critical there: Snapshots!  They make it easy and efficient to do an  
online snapshot after just freezing the database for a second or three.



 - ask

[1] http://lenz.homelinux.org/mylvmbackup/

--
http://develooper.com/ - http://askask.com/


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


2.6.22.16 MD raid1 doesn't mark removed disk faulty, MD thread goes UN

2008-01-21 Thread Mike Snitzer
Under 2.6.22.16, I physically pulled a SATA disk (/dev/sdac, connected to
an aacraid controller) that was acting as the local raid1 member of
/dev/md30.

Linux MD didn't see an /dev/sdac1 error until I tried forcing the issue by
doing a read (with dd) from /dev/md30:

Jan 21 17:08:07 lab17-233 kernel: sd 2:0:27:0: [sdac] Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
Jan 21 17:08:07 lab17-233 kernel: sd 2:0:27:0: [sdac] Sense Key :
Hardware Error [current]
Jan 21 17:08:07 lab17-233 kernel: Info fld=0x0
Jan 21 17:08:07 lab17-233 kernel: sd 2:0:27:0: [sdac] Add. Sense:
Internal target failure
Jan 21 17:08:07 lab17-233 kernel: end_request: I/O error, dev sdac, sector 71
Jan 21 17:08:07 lab17-233 kernel: printk: 3 messages suppressed.
Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 8
Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 16
Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 24
Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 32
Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 40
Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 48
Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 56
Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 64
Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 72
Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 80
Jan 21 17:08:07 lab17-233 kernel: sd 2:0:27:0: [sdac] Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
Jan 21 17:08:07 lab17-233 kernel: sd 2:0:27:0: [sdac] Sense Key :
Hardware Error [current]
Jan 21 17:08:07 lab17-233 kernel: Info fld=0x0
Jan 21 17:08:07 lab17-233 kernel: sd 2:0:27:0: [sdac] Add. Sense:
Internal target failure
Jan 21 17:08:07 lab17-233 kernel: end_request: I/O error, dev sdac, sector 343
Jan 21 17:08:08 lab17-233 kernel: sd 2:0:27:0: [sdac] Result:
hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
Jan 21 17:08:08 lab17-233 kernel: sd 2:0:27:0: [sdac] Sense Key :
Hardware Error [current]
Jan 21 17:08:08 lab17-233 kernel: Info fld=0x0
...
Jan 21 17:08:12 lab17-233 kernel: sd 2:0:27:0: [sdac] Add. Sense:
Internal target failure
Jan 21 17:08:12 lab17-233 kernel: end_request: I/O error, dev sdac, sector 3399
Jan 21 17:08:12 lab17-233 kernel: printk: 765 messages suppressed.
Jan 21 17:08:12 lab17-233 kernel: raid1: sdac1: rescheduling sector 3336

However, the MD layer still hasn't marked the sdac1 member faulty:

md30 : active raid1 nbd2[1](W) sdac1[0]
  4016204 blocks super 1.0 [2/2] [UU]
  bitmap: 1/8 pages [4KB], 256KB chunk

The dd I used to read from /dev/md30 is blocked on IO:

Jan 21 17:13:55 lab17-233 kernel: ddD 0afa9cf5c346
0 12337   7702 (NOTLB)
Jan 21 17:13:55 lab17-233 kernel:  81010c449868 0082
 80268f14
Jan 21 17:13:55 lab17-233 kernel:  81015da6f320 81015de532c0
0008 81012d9d7780
Jan 21 17:13:55 lab17-233 kernel:  81015fae2880 4926
81012d9d7970 0001802879a0
Jan 21 17:13:55 lab17-233 kernel: Call Trace:
Jan 21 17:13:55 lab17-233 kernel:  [80268f14] mempool_alloc+0x24/0xda
Jan 21 17:13:55 lab17-233 kernel:  [88b91381]
:raid1:wait_barrier+0x84/0xc2
Jan 21 17:13:55 lab17-233 kernel:  [8022d8fa]
default_wake_function+0x0/0xe
Jan 21 17:13:55 lab17-233 kernel:  [88b92093]
:raid1:make_request+0x83/0x5c0
Jan 21 17:13:55 lab17-233 kernel:  [80305acd]
__make_request+0x57f/0x668
Jan 21 17:13:55 lab17-233 kernel:  [80302dc7]
generic_make_request+0x26e/0x2a9
Jan 21 17:13:55 lab17-233 kernel:  [80268f14] mempool_alloc+0x24/0xda
Jan 21 17:13:55 lab17-233 kernel:  [8030db39] __next_cpu+0x19/0x28
Jan 21 17:13:55 lab17-233 kernel:  [80305162] submit_bio+0xb6/0xbd
Jan 21 17:13:55 lab17-233 kernel:  [802aba6a] submit_bh+0xdf/0xff
Jan 21 17:13:55 lab17-233 kernel:  [802ae188]
block_read_full_page+0x271/0x28e
Jan 21 17:13:55 lab17-233 kernel:  [802b0b27]
blkdev_get_block+0x0/0x46
Jan 21 17:13:55 lab17-233 kernel:  [803103ad]
radix_tree_insert+0xcb/0x18c
Jan 21 17:13:55 lab17-233 kernel:  [8026d003]
__do_page_cache_readahead+0x16d/0x1df
Jan 21 17:13:55 lab17-233 kernel:  [80248c51] getnstimeofday+0x32/0x8d
Jan 21 17:13:55 lab17-233 kernel:  [80247e5e] ktime_get_ts+0x1a/0x4e
Jan 21 17:13:55 lab17-233 kernel:  [80265543] delayacct_end+0x7d/0x88
Jan 21 17:13:55 lab17-233 kernel:  [8026d0c8]
blockable_page_cache_readahead+0x53/0xb2
Jan 21 17:13:55 lab17-233 kernel:  [8026d1a9]
make_ahead_window+0x82/0x9e
Jan 21 17:13:55 lab17-233 kernel:  [8026d34f]
page_cache_readahead+0x18a/0x1c1
Jan 21 17:13:55 lab17-233 kernel:  [8026723c]
do_generic_mapping_read+0x135/0x3fc
Jan 21 17:13:55 lab17-233 kernel:  [80266755]
file_read_actor+0x0/0x170
Jan 21 17:13:55 lab17-233 kernel:  

Re: idle array consuming cpu ??!!

2008-01-21 Thread Carlos Carvalho
Neil Brown ([EMAIL PROTECTED]) wrote on 21 January 2008 12:15:
 On Sunday January 20, [EMAIL PROTECTED] wrote:
  A raid6 array with a spare and bitmap is idle: not mounted and with no
  IO to it or any of its disks (obviously), as shown by iostat. However
  it's consuming cpu: since reboot it used about 11min in 24h, which is quite
  a lot even for a busy array (the cpus are fast). The array was cleanly
  shutdown so there's been no reconstruction/check or anything else.
  
  How can this be? Kernel is 2.6.22.16 with the two patches for the
  deadlock ([PATCH 004 of 4] md: Fix an occasional deadlock in raid5 -
  FIX) and the previous one.
 
 Maybe the bitmap code is waking up regularly to do nothing.
 
 Would you be happy to experiment?  Remove the bitmap with
mdadm --grow /dev/mdX --bitmap=none
 
 and see how that affects cpu usage?

Confirmed, removing the bitmap stopped cpu consumption.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: One Large md or Many Smaller md for Better Peformance?

2008-01-21 Thread Carlos Carvalho
Moshe Yudkowsky ([EMAIL PROTECTED]) wrote on 20 January 2008 21:19:
 Thanks for the tips, and in particular:
 
 Iustin Pop wrote:
 
- if you download torrents, fragmentation is a real problem, so use a
  filesystem that knows how to preallocate space (XFS and maybe ext4;
  for XFS use xfs_io to set a bigger extend size for where you
  download)
 
 That's a very interesting idea; it also gives me an opportunity to 
 experiment with XFS. I had been avoiding it because of possible 
 power-failure issues on writes.

I use reiser3 and xfs. reiser3 is very good with many small files. A
simple test shows interactively perceptible results: removing large
files is faster with xfs, removing large directories (ex. the kernel
tree) is faster with reiser3.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.22.16 MD raid1 doesn't mark removed disk faulty, MD thread goes UN

2008-01-21 Thread Mike Snitzer
cc'ing Tanaka-san given his recent raid1 BUG report:
http://lkml.org/lkml/2008/1/14/515

On Jan 21, 2008 6:04 PM, Mike Snitzer [EMAIL PROTECTED] wrote:
 Under 2.6.22.16, I physically pulled a SATA disk (/dev/sdac, connected to
 an aacraid controller) that was acting as the local raid1 member of
 /dev/md30.

 Linux MD didn't see an /dev/sdac1 error until I tried forcing the issue by
 doing a read (with dd) from /dev/md30:

 Jan 21 17:08:07 lab17-233 kernel: sd 2:0:27:0: [sdac] Result:
 hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
 Jan 21 17:08:07 lab17-233 kernel: sd 2:0:27:0: [sdac] Sense Key :
 Hardware Error [current]
 Jan 21 17:08:07 lab17-233 kernel: Info fld=0x0
 Jan 21 17:08:07 lab17-233 kernel: sd 2:0:27:0: [sdac] Add. Sense:
 Internal target failure
 Jan 21 17:08:07 lab17-233 kernel: end_request: I/O error, dev sdac, sector 71
 Jan 21 17:08:07 lab17-233 kernel: printk: 3 messages suppressed.
 Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 8
 Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 16
 Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 24
 Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 32
 Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 40
 Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 48
 Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 56
 Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 64
 Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 72
 Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 80
 Jan 21 17:08:07 lab17-233 kernel: sd 2:0:27:0: [sdac] Result:
 hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
 Jan 21 17:08:07 lab17-233 kernel: sd 2:0:27:0: [sdac] Sense Key :
 Hardware Error [current]
 Jan 21 17:08:07 lab17-233 kernel: Info fld=0x0
 Jan 21 17:08:07 lab17-233 kernel: sd 2:0:27:0: [sdac] Add. Sense:
 Internal target failure
 Jan 21 17:08:07 lab17-233 kernel: end_request: I/O error, dev sdac, sector 343
 Jan 21 17:08:08 lab17-233 kernel: sd 2:0:27:0: [sdac] Result:
 hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
 Jan 21 17:08:08 lab17-233 kernel: sd 2:0:27:0: [sdac] Sense Key :
 Hardware Error [current]
 Jan 21 17:08:08 lab17-233 kernel: Info fld=0x0
 ...
 Jan 21 17:08:12 lab17-233 kernel: sd 2:0:27:0: [sdac] Add. Sense:
 Internal target failure
 Jan 21 17:08:12 lab17-233 kernel: end_request: I/O error, dev sdac, sector 
 3399
 Jan 21 17:08:12 lab17-233 kernel: printk: 765 messages suppressed.
 Jan 21 17:08:12 lab17-233 kernel: raid1: sdac1: rescheduling sector 3336

 However, the MD layer still hasn't marked the sdac1 member faulty:

 md30 : active raid1 nbd2[1](W) sdac1[0]
   4016204 blocks super 1.0 [2/2] [UU]
   bitmap: 1/8 pages [4KB], 256KB chunk

 The dd I used to read from /dev/md30 is blocked on IO:

 Jan 21 17:13:55 lab17-233 kernel: ddD 0afa9cf5c346
 0 12337   7702 (NOTLB)
 Jan 21 17:13:55 lab17-233 kernel:  81010c449868 0082
  80268f14
 Jan 21 17:13:55 lab17-233 kernel:  81015da6f320 81015de532c0
 0008 81012d9d7780
 Jan 21 17:13:55 lab17-233 kernel:  81015fae2880 4926
 81012d9d7970 0001802879a0
 Jan 21 17:13:55 lab17-233 kernel: Call Trace:
 Jan 21 17:13:55 lab17-233 kernel:  [80268f14] 
 mempool_alloc+0x24/0xda
 Jan 21 17:13:55 lab17-233 kernel:  [88b91381]
 :raid1:wait_barrier+0x84/0xc2
 Jan 21 17:13:55 lab17-233 kernel:  [8022d8fa]
 default_wake_function+0x0/0xe
 Jan 21 17:13:55 lab17-233 kernel:  [88b92093]
 :raid1:make_request+0x83/0x5c0
 Jan 21 17:13:55 lab17-233 kernel:  [80305acd]
 __make_request+0x57f/0x668
 Jan 21 17:13:55 lab17-233 kernel:  [80302dc7]
 generic_make_request+0x26e/0x2a9
 Jan 21 17:13:55 lab17-233 kernel:  [80268f14] 
 mempool_alloc+0x24/0xda
 Jan 21 17:13:55 lab17-233 kernel:  [8030db39] __next_cpu+0x19/0x28
 Jan 21 17:13:55 lab17-233 kernel:  [80305162] submit_bio+0xb6/0xbd
 Jan 21 17:13:55 lab17-233 kernel:  [802aba6a] submit_bh+0xdf/0xff
 Jan 21 17:13:55 lab17-233 kernel:  [802ae188]
 block_read_full_page+0x271/0x28e
 Jan 21 17:13:55 lab17-233 kernel:  [802b0b27]
 blkdev_get_block+0x0/0x46
 Jan 21 17:13:55 lab17-233 kernel:  [803103ad]
 radix_tree_insert+0xcb/0x18c
 Jan 21 17:13:55 lab17-233 kernel:  [8026d003]
 __do_page_cache_readahead+0x16d/0x1df
 Jan 21 17:13:55 lab17-233 kernel:  [80248c51] 
 getnstimeofday+0x32/0x8d
 Jan 21 17:13:55 lab17-233 kernel:  [80247e5e] ktime_get_ts+0x1a/0x4e
 Jan 21 17:13:55 lab17-233 kernel:  [80265543] 
 delayacct_end+0x7d/0x88
 Jan 21 17:13:55 lab17-233 kernel:  [8026d0c8]
 blockable_page_cache_readahead+0x53/0xb2
 Jan 21 17:13:55 lab17-233 kernel:  [8026d1a9]
 make_ahead_window+0x82/0x9e
 Jan 21 17:13:55 lab17-233 kernel: