raid10 messed up by false multipath setup (was: raid10 messed up filesystem, lvm lv ok)
On Jan 19, 2008, at 3:44 AM, Ask Bjørn Hansen wrote: Replying to myself with an update, mostly for the sake of the archives (I went through the linux-raid mail from the last year yesterday while waiting for my raw-partition backups to finish). I mentioned[1] my trouble with the multipath detection code on the Fedora rescue mode messing up my raid yesterday. [...] I suspect that maybe the layout of the md device got messed up? How can I find out if that's the case? Would it be possible to recover from (assuming all the data still is on some of the disks). I realized that of the 11 disks (9 in the raid, 2 spares) one of the disks affected by the fake multipath mishap was a spare, so after backing up all the raw partitions[2] I re-created the raid in place with the other affected disk marked missing and it seems like the file system is more or less okay. Yes, I'm doing a backup now. :-) Lessons: 1) Do backups of your raid'ed data. Yes, it can be a pain but figure it out. 2) Keep your root partition on a simple raid1 (or on a lvm group that's on a simple raid1). 3) When the raid goes @#$ - don't panic, make sure nothing is being written to the disks and stop. (Some years ago I lost a raid5 to the oops, had a read-error, drop the last disk issue and I suspect I could have saved it had I been patient and stopped working on it until I was more awake). 4) Have/make copies of the mdadm -D / -E output. 5) If you care about the data, do a backup of your raw partitions before trying to restore. 6) The create the raid on top of the old raid trick saves the day again (for a while I had some kind of cabling problem on a box with a raid6 - I lost track of how many times I did the recreate thing). Secondary question: I'm doing a dd if=/dev/sdX5 bs=256k /backup/ sdX5 for each disk -- is there a way to run mdadm on the copies and experiment on those?(It took ~forever to copy a terabyte of the raw partitions). (For the archives) - I didn't try it, but setting up the disk images as loop devices should work. I didn't think of that yesterday. - ask [1] http://marc.info/?l=linux-raidm=120065542429935w=2 [2] And oh man am I glad I backed them up. On my first attempt at recreating the raid I forgot the md device parameter and --assume- clean, so it created a raid device on one of my source partitions and immediately started syncing at ~120MB/sec. Restoring the partitions from the backup worked fine fortunately. -- http://develooper.com/ - http://askask.com/ - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mdadm error when trying to replace a failed drive in RAID5 array
On Jan 20, 2008, at 1:21 PM, Steve Fairbairn wrote: So the device I was trying to add was about 22 blocks too small. Taking Neils suggestion and looking at /proc/partitions showed this up incredibly quickly. Always leave a little space in the end; it makes sure you don't run into that particular problem when you replace disks and the end of the disk is often significantly slower anyway. From before the write-intent bitmap stuff I have/had a habit of creating separate raids on relatively small partitions (joined together by LVM). I'd just pick a fixed size (on 500GB disks I'd use 90GB per partition for example) and create however many partitions would fit like that and leave the end for scratch space / experiments / whatever. - ask -- http://develooper.com/ - http://askask.com/ - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: One Large md or Many Smaller md for Better Peformance?
On Jan 20, 2008, at 2:18 PM, Bill Davidsen wrote: One partitionable RAID-10, perhaps, then partition as needed. Read the discussion here about performance of LVM and RAID. I personally don't do LVM unless I know I will have to have great flexibility of configuration and can give up performance to get it. Other report different results, so make up your own mind. On MySQL servers I always always use LVM, even if performance is critical there: Snapshots! They make it easy and efficient to do an online snapshot after just freezing the database for a second or three. - ask [1] http://lenz.homelinux.org/mylvmbackup/ -- http://develooper.com/ - http://askask.com/ - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
2.6.22.16 MD raid1 doesn't mark removed disk faulty, MD thread goes UN
Under 2.6.22.16, I physically pulled a SATA disk (/dev/sdac, connected to an aacraid controller) that was acting as the local raid1 member of /dev/md30. Linux MD didn't see an /dev/sdac1 error until I tried forcing the issue by doing a read (with dd) from /dev/md30: Jan 21 17:08:07 lab17-233 kernel: sd 2:0:27:0: [sdac] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK Jan 21 17:08:07 lab17-233 kernel: sd 2:0:27:0: [sdac] Sense Key : Hardware Error [current] Jan 21 17:08:07 lab17-233 kernel: Info fld=0x0 Jan 21 17:08:07 lab17-233 kernel: sd 2:0:27:0: [sdac] Add. Sense: Internal target failure Jan 21 17:08:07 lab17-233 kernel: end_request: I/O error, dev sdac, sector 71 Jan 21 17:08:07 lab17-233 kernel: printk: 3 messages suppressed. Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 8 Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 16 Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 24 Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 32 Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 40 Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 48 Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 56 Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 64 Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 72 Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 80 Jan 21 17:08:07 lab17-233 kernel: sd 2:0:27:0: [sdac] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK Jan 21 17:08:07 lab17-233 kernel: sd 2:0:27:0: [sdac] Sense Key : Hardware Error [current] Jan 21 17:08:07 lab17-233 kernel: Info fld=0x0 Jan 21 17:08:07 lab17-233 kernel: sd 2:0:27:0: [sdac] Add. Sense: Internal target failure Jan 21 17:08:07 lab17-233 kernel: end_request: I/O error, dev sdac, sector 343 Jan 21 17:08:08 lab17-233 kernel: sd 2:0:27:0: [sdac] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK Jan 21 17:08:08 lab17-233 kernel: sd 2:0:27:0: [sdac] Sense Key : Hardware Error [current] Jan 21 17:08:08 lab17-233 kernel: Info fld=0x0 ... Jan 21 17:08:12 lab17-233 kernel: sd 2:0:27:0: [sdac] Add. Sense: Internal target failure Jan 21 17:08:12 lab17-233 kernel: end_request: I/O error, dev sdac, sector 3399 Jan 21 17:08:12 lab17-233 kernel: printk: 765 messages suppressed. Jan 21 17:08:12 lab17-233 kernel: raid1: sdac1: rescheduling sector 3336 However, the MD layer still hasn't marked the sdac1 member faulty: md30 : active raid1 nbd2[1](W) sdac1[0] 4016204 blocks super 1.0 [2/2] [UU] bitmap: 1/8 pages [4KB], 256KB chunk The dd I used to read from /dev/md30 is blocked on IO: Jan 21 17:13:55 lab17-233 kernel: ddD 0afa9cf5c346 0 12337 7702 (NOTLB) Jan 21 17:13:55 lab17-233 kernel: 81010c449868 0082 80268f14 Jan 21 17:13:55 lab17-233 kernel: 81015da6f320 81015de532c0 0008 81012d9d7780 Jan 21 17:13:55 lab17-233 kernel: 81015fae2880 4926 81012d9d7970 0001802879a0 Jan 21 17:13:55 lab17-233 kernel: Call Trace: Jan 21 17:13:55 lab17-233 kernel: [80268f14] mempool_alloc+0x24/0xda Jan 21 17:13:55 lab17-233 kernel: [88b91381] :raid1:wait_barrier+0x84/0xc2 Jan 21 17:13:55 lab17-233 kernel: [8022d8fa] default_wake_function+0x0/0xe Jan 21 17:13:55 lab17-233 kernel: [88b92093] :raid1:make_request+0x83/0x5c0 Jan 21 17:13:55 lab17-233 kernel: [80305acd] __make_request+0x57f/0x668 Jan 21 17:13:55 lab17-233 kernel: [80302dc7] generic_make_request+0x26e/0x2a9 Jan 21 17:13:55 lab17-233 kernel: [80268f14] mempool_alloc+0x24/0xda Jan 21 17:13:55 lab17-233 kernel: [8030db39] __next_cpu+0x19/0x28 Jan 21 17:13:55 lab17-233 kernel: [80305162] submit_bio+0xb6/0xbd Jan 21 17:13:55 lab17-233 kernel: [802aba6a] submit_bh+0xdf/0xff Jan 21 17:13:55 lab17-233 kernel: [802ae188] block_read_full_page+0x271/0x28e Jan 21 17:13:55 lab17-233 kernel: [802b0b27] blkdev_get_block+0x0/0x46 Jan 21 17:13:55 lab17-233 kernel: [803103ad] radix_tree_insert+0xcb/0x18c Jan 21 17:13:55 lab17-233 kernel: [8026d003] __do_page_cache_readahead+0x16d/0x1df Jan 21 17:13:55 lab17-233 kernel: [80248c51] getnstimeofday+0x32/0x8d Jan 21 17:13:55 lab17-233 kernel: [80247e5e] ktime_get_ts+0x1a/0x4e Jan 21 17:13:55 lab17-233 kernel: [80265543] delayacct_end+0x7d/0x88 Jan 21 17:13:55 lab17-233 kernel: [8026d0c8] blockable_page_cache_readahead+0x53/0xb2 Jan 21 17:13:55 lab17-233 kernel: [8026d1a9] make_ahead_window+0x82/0x9e Jan 21 17:13:55 lab17-233 kernel: [8026d34f] page_cache_readahead+0x18a/0x1c1 Jan 21 17:13:55 lab17-233 kernel: [8026723c] do_generic_mapping_read+0x135/0x3fc Jan 21 17:13:55 lab17-233 kernel: [80266755] file_read_actor+0x0/0x170 Jan 21 17:13:55 lab17-233 kernel:
Re: idle array consuming cpu ??!!
Neil Brown ([EMAIL PROTECTED]) wrote on 21 January 2008 12:15: On Sunday January 20, [EMAIL PROTECTED] wrote: A raid6 array with a spare and bitmap is idle: not mounted and with no IO to it or any of its disks (obviously), as shown by iostat. However it's consuming cpu: since reboot it used about 11min in 24h, which is quite a lot even for a busy array (the cpus are fast). The array was cleanly shutdown so there's been no reconstruction/check or anything else. How can this be? Kernel is 2.6.22.16 with the two patches for the deadlock ([PATCH 004 of 4] md: Fix an occasional deadlock in raid5 - FIX) and the previous one. Maybe the bitmap code is waking up regularly to do nothing. Would you be happy to experiment? Remove the bitmap with mdadm --grow /dev/mdX --bitmap=none and see how that affects cpu usage? Confirmed, removing the bitmap stopped cpu consumption. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: One Large md or Many Smaller md for Better Peformance?
Moshe Yudkowsky ([EMAIL PROTECTED]) wrote on 20 January 2008 21:19: Thanks for the tips, and in particular: Iustin Pop wrote: - if you download torrents, fragmentation is a real problem, so use a filesystem that knows how to preallocate space (XFS and maybe ext4; for XFS use xfs_io to set a bigger extend size for where you download) That's a very interesting idea; it also gives me an opportunity to experiment with XFS. I had been avoiding it because of possible power-failure issues on writes. I use reiser3 and xfs. reiser3 is very good with many small files. A simple test shows interactively perceptible results: removing large files is faster with xfs, removing large directories (ex. the kernel tree) is faster with reiser3. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.22.16 MD raid1 doesn't mark removed disk faulty, MD thread goes UN
cc'ing Tanaka-san given his recent raid1 BUG report: http://lkml.org/lkml/2008/1/14/515 On Jan 21, 2008 6:04 PM, Mike Snitzer [EMAIL PROTECTED] wrote: Under 2.6.22.16, I physically pulled a SATA disk (/dev/sdac, connected to an aacraid controller) that was acting as the local raid1 member of /dev/md30. Linux MD didn't see an /dev/sdac1 error until I tried forcing the issue by doing a read (with dd) from /dev/md30: Jan 21 17:08:07 lab17-233 kernel: sd 2:0:27:0: [sdac] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK Jan 21 17:08:07 lab17-233 kernel: sd 2:0:27:0: [sdac] Sense Key : Hardware Error [current] Jan 21 17:08:07 lab17-233 kernel: Info fld=0x0 Jan 21 17:08:07 lab17-233 kernel: sd 2:0:27:0: [sdac] Add. Sense: Internal target failure Jan 21 17:08:07 lab17-233 kernel: end_request: I/O error, dev sdac, sector 71 Jan 21 17:08:07 lab17-233 kernel: printk: 3 messages suppressed. Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 8 Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 16 Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 24 Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 32 Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 40 Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 48 Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 56 Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 64 Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 72 Jan 21 17:08:07 lab17-233 kernel: raid1: sdac1: rescheduling sector 80 Jan 21 17:08:07 lab17-233 kernel: sd 2:0:27:0: [sdac] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK Jan 21 17:08:07 lab17-233 kernel: sd 2:0:27:0: [sdac] Sense Key : Hardware Error [current] Jan 21 17:08:07 lab17-233 kernel: Info fld=0x0 Jan 21 17:08:07 lab17-233 kernel: sd 2:0:27:0: [sdac] Add. Sense: Internal target failure Jan 21 17:08:07 lab17-233 kernel: end_request: I/O error, dev sdac, sector 343 Jan 21 17:08:08 lab17-233 kernel: sd 2:0:27:0: [sdac] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK Jan 21 17:08:08 lab17-233 kernel: sd 2:0:27:0: [sdac] Sense Key : Hardware Error [current] Jan 21 17:08:08 lab17-233 kernel: Info fld=0x0 ... Jan 21 17:08:12 lab17-233 kernel: sd 2:0:27:0: [sdac] Add. Sense: Internal target failure Jan 21 17:08:12 lab17-233 kernel: end_request: I/O error, dev sdac, sector 3399 Jan 21 17:08:12 lab17-233 kernel: printk: 765 messages suppressed. Jan 21 17:08:12 lab17-233 kernel: raid1: sdac1: rescheduling sector 3336 However, the MD layer still hasn't marked the sdac1 member faulty: md30 : active raid1 nbd2[1](W) sdac1[0] 4016204 blocks super 1.0 [2/2] [UU] bitmap: 1/8 pages [4KB], 256KB chunk The dd I used to read from /dev/md30 is blocked on IO: Jan 21 17:13:55 lab17-233 kernel: ddD 0afa9cf5c346 0 12337 7702 (NOTLB) Jan 21 17:13:55 lab17-233 kernel: 81010c449868 0082 80268f14 Jan 21 17:13:55 lab17-233 kernel: 81015da6f320 81015de532c0 0008 81012d9d7780 Jan 21 17:13:55 lab17-233 kernel: 81015fae2880 4926 81012d9d7970 0001802879a0 Jan 21 17:13:55 lab17-233 kernel: Call Trace: Jan 21 17:13:55 lab17-233 kernel: [80268f14] mempool_alloc+0x24/0xda Jan 21 17:13:55 lab17-233 kernel: [88b91381] :raid1:wait_barrier+0x84/0xc2 Jan 21 17:13:55 lab17-233 kernel: [8022d8fa] default_wake_function+0x0/0xe Jan 21 17:13:55 lab17-233 kernel: [88b92093] :raid1:make_request+0x83/0x5c0 Jan 21 17:13:55 lab17-233 kernel: [80305acd] __make_request+0x57f/0x668 Jan 21 17:13:55 lab17-233 kernel: [80302dc7] generic_make_request+0x26e/0x2a9 Jan 21 17:13:55 lab17-233 kernel: [80268f14] mempool_alloc+0x24/0xda Jan 21 17:13:55 lab17-233 kernel: [8030db39] __next_cpu+0x19/0x28 Jan 21 17:13:55 lab17-233 kernel: [80305162] submit_bio+0xb6/0xbd Jan 21 17:13:55 lab17-233 kernel: [802aba6a] submit_bh+0xdf/0xff Jan 21 17:13:55 lab17-233 kernel: [802ae188] block_read_full_page+0x271/0x28e Jan 21 17:13:55 lab17-233 kernel: [802b0b27] blkdev_get_block+0x0/0x46 Jan 21 17:13:55 lab17-233 kernel: [803103ad] radix_tree_insert+0xcb/0x18c Jan 21 17:13:55 lab17-233 kernel: [8026d003] __do_page_cache_readahead+0x16d/0x1df Jan 21 17:13:55 lab17-233 kernel: [80248c51] getnstimeofday+0x32/0x8d Jan 21 17:13:55 lab17-233 kernel: [80247e5e] ktime_get_ts+0x1a/0x4e Jan 21 17:13:55 lab17-233 kernel: [80265543] delayacct_end+0x7d/0x88 Jan 21 17:13:55 lab17-233 kernel: [8026d0c8] blockable_page_cache_readahead+0x53/0xb2 Jan 21 17:13:55 lab17-233 kernel: [8026d1a9] make_ahead_window+0x82/0x9e Jan 21 17:13:55 lab17-233 kernel: