Re: 2.2.16 RAID patch
On Tue, 13 Jun 2000, Marc Haber wrote: A kernel patched this way doesn't build with Debian's kernel package. Complains "The version number 2.2.16-RAID is not all lowercase. Stop." Could this be changed to 2.2.16-raid for future versions or should I better get in touch with kernel-package's maintainer? if there is any good reason for this rule then i can change it to -raid. Does anyone know why uppercase characters are a problem? Ingo
MD_BOOT is _flawed_
On Tue, 13 Jun 2000, Neil Brown wrote: One way is by setting the partition type of the relevant partitions. This is nice and easy, but requires you to use MSDOS style partition tables (which only 99.4% of Linux users do:-), and works fine for RAID0 or 1 or 5 or Linear. no, (and i told you this before) it does not need MSDOS style partition tables. Linux's partition system is 'generic', and when i implemented this i only added the 'lowlevel glue' to support MSDOS-style partitions (that was what i can test). It's trivial to add code for every other partitioning type as well, such as BSD disklabels, which are available on every platform. (two minutes later) In fact i've just added BSD-disklabels autostart support to my tree, it's a twoliner: --- msdos.c.origMon Jun 12 03:27:25 2000 +++ msdos.c Tue Jun 13 00:46:42 2000 @@ -256,6 +256,8 @@ } /* if the bsd partition is not currently known to linux, we end * up here */ + if (bsd_p-p_fstype == LINUX_RAID_PARTITION) + md_autodetect_dev(MKDEV(hd-major, current_minor)); add_gd_partition(hd, current_minor, bsd_p-p_offset, bsd_p-p_size); current_minor++; } this is the major reason why i consider MD_BOOT an inferior solution, and i'm still convinced about phasing it out (gradually, later on). We do not need two ways of boot-time starting up arrays, and superblock-less arrays are dangerous anyway. Especially as MD_BOOT is fundamentally inferior. The other way is to explicitly tell the kernel via md= options. This [...] doesn't easily deal with devices changing name (as scsi devices can do when you plug in new devices). it also doesnt deal with disk failures. Autostart is able to start up a (failure-resistent such as RAID1/RAID5) array even if one of the disks has failed. MD_BOOT cannot deal with certain types of disk failures. So both have short comings, [...] no, only MD_BOOT has shortcomings, and i'm very convinced it will be phased out. The only reason i accepted it is that some people want to start up (legacy) non-persistent arrays at boot-time. I'm going to remove the ability to start up persistent arrays via MD_BOOT, so that people do not get the habit if starting up persistent arrays in an inferior way with MD_BOOT. It's a pure compatibility thing, the MD_BOOT code is short and localized. be addressed, and probably will over the months, but in any case, it's nice to have a choice (unless it is confusing I guess). MD_BOOT cannot be fixed. It will _not_ be able to start up arrays if the device name changes (eg. due to a failure), no matter how hard you try. And yes, MD_BOOT is confusing. Me: I choose MD_BOOT because I like explicit control. Persistent arrays are _not_ identified by their (temporary) device names. Anything short of autostart arrays does not make full use of Linux-RAID's capabilities to deal with various failure scenarios, and is fundamentally flawed. (if you want to have an explicit pre-rootmount startup method then you can still use initrd.) Ingo
[patch] RAID 0/1/4/5 release, raid-2.4.0-test1-ac15-B4
you can find the latest 2.4 RAID code at: http://www.redhat.com/~mingo/raid-patches/raid-2.4.0-test1-ac15-B4 this is against the latest Alan Cox kernel (ac15), which can be found at: http://www.kernel.org/pub/linux/kernel/people/alan/2.4.0test which is against the stock 2.4.0-test1 kernel. I'd urge every 2.4 RAID user to upgrade to the ac- kernels and this RAID patch, as it fixes critical bugs. Users of the production 2.2-based RAID code should not upgrade yet. this release contains most of the fixes from Neil Brown and Jakob Oestergaard. (thanks Neil and Jakob!) (i've cleaned up those patches and fixed bugs in them as well) It also contains the onliner bugfix from Anton Altaparmakov. This RAID release finally also adds Mika Kuoppala's RAID1 read balancing code, which is a great speedup for RAID1 systems. (cool stuff Mika!) if any bug is still present in this release then please resend the bugreport. Please resend patches if any of them didnt make it in (it's likely due to cleanness issues.) i'm also very interested in slowdowns relative to 2.2+latest_RAID, for all RAID levels - does it still happen with this patchset as well? i've tested the patch and it's stable under all circumstances i could reproduce - be careful nevertheless. (RAID0/RAID1/RAID5 under SMP is tested) Ingo
2.2.16 RAID patch
the latest 2.2 (production) RAID code against 2.2.16-final can be found at: http://www.redhat.com/~mingo/raid-patches/raid-2.2.16-A0 let me know if you have any problems with it. Ingo
Re: 2.2.16 RAID patch
On Mon, 12 Jun 2000, Stephen Frost wrote: Didn't appear to patch cleanly against a clean 2.2.16 tree, error was in md.c and left a rather large .rej file.. ouch, right - i've uploaded a new patch. (this problem was caused by a bug in creating the patch) Ingo
Re: 2.2.16 RAID patch
On Mon, 12 Jun 2000, Stephen Frost wrote: ouch, right - i've uploaded a new patch. (this problem was caused by a bug in creating the patch) Much nicer, patched cleanly, thanks. Now time to see if it compiles and works happily. ;) it should :-) the problem was in creating the patch - the code itself didnt change. Ingo
RE: [patch] RAID 0/1/4/5 release, raid-2.4.0-test1-ac15-B4
On Mon, 12 Jun 2000, Darren Evans wrote: can raidtools-19990824-0.90.tar.gz be used with your patch available on http://people.redhat.com/mingo/raid-patches/raid-2.2.16-A0 for new style RAID on a 2.2.16 kernel instead of the raid0145-19990824-2.2.11 patch. yep. I noticed the name had an A0 at the end, presumably thats's alpha 0 release on 2.2.16? no, it's the 'stable' release. 'A0' is just an internal id for me. Ingo
Re: Benchmarks, raid0 performance, 1,2,3,4 drives
could you send me your /etc/raidtab? I've tested the performance of 4-disk RAID0 on SCSI, and it scales perfectly here, as far as hdparm -t goes. (could you also send the 'hdparm -t /dev/md0' results, do you see a degradation in those numbers as well?) it could either be some special thing in your setup, or an IDE+RAID performance problem. Ingo
Re: Disk failure-Error message indicates bug
On Fri, 19 May 2000, Neil Brown wrote: - md2 checks b_rdev to see which device was in error. It gets confused because sda12 is not part of md2. The fix probably involves making sure that b_dev really does refer to md0 (a quick look at the code suggests it actually refers to md2!) and then using b_dev instead of b_rdev. the fix i think is to not look at b_rdev in the error path (and anywhere else), at all. Just like we dont look at rsector. Do we need that information? b_rdev is in fact just for RAID0 and LINEAR, and i believe it would be cleaner to get rid of it altogether, and create a new encapsulated bh for every RAID0 request, like we do it in RAID1/RAID5. OTOH handling this is clearly more complex than RAID0 itself. Basically, b_rdev and b_rsector cannot be trusted after a call to make_request, but they are being trusted. yep. What about this solution: md.c (or buffer.c) implements a generic pool of IO-related buffer-heads. This pool would have deadlock assurance, and allocation from this pool could never fail. This would already reduce the complexity of raid1.c and raid5.c bh-allocation. Then raid0.c and linear.c is changed to create a new bh for the mapping, which is hung off bh-b_dev_id. bh-b_rdev would be gone, ll_rw_blk looks at bh-b_dev. This also simplifies the handling of bhs. i like this solution much better, and i dont think there is any significant performance impact (starting IO is heavy anyway), but it would clean up this issue for once and for all. Ingo
Re: Can't recover raid5 1 disk failure - Could not import [dev21:01]!
On Wed, 12 Apr 2000, Darren Nickerson wrote: So no problem, I have 3 of the four left, right? The array was marked [_UUU] just before I power cycled (the disk was crashing) and since it had been marked faulty, I was able to raidhotremove the underlined one. But now, it won't boot into degraded mode. As I try to boot redhat to single user, I am told: md: could not lock [dev 21:01], zero size? Marking faulty Could not import [dev 21:01]! Autostart [dev 21:01] failed! this happens because raidstart looks at the first entry in /etc/raidtab to start up an array. If that entry is damaged, it does not cycle through the other entries to start up the array. The solution is to permutate the entries in /etc/raidtab. (make sure to restore the original order) if you switch to boot-time autostart then this should not happen, RAID partitions are first collected then started up, and the code should be able to start up the array, no matter which disk got damaged. Ingo
Re: Can't recover raid5 1 disk failure - Could not import [dev21:01]!
On Wed, 12 Apr 2000, Darren Nickerson wrote: I'm confused. I thought I WAS boot-time autostarting. RedHat's definitely autodetecting and starting the array very early in the boot process, but I'm clearly not entirely properly setup here because my partition types are not 0xfd, which seems to be important for some reason or another. [...] well, it was boot-time 'very early' autostarting, but not RAID-autostarting in the classic sense. I think i'll fix raidstart to simply iterate through all available partitions, until one is started up correctly (or until all entries fail). This still doesnt cover all the cases which are covered by the 0xfd method (such as card failure, device reshuffling, etc.), but should cover your case (which is definitely the most common one). So, you're saying that the array would have automatically recovered if I had had all five partitions set 0xfd? yes, definitely. Not marking a partition 0xfd is the more conservative approach from the installer's point of view in a possibly multi-OS environment, you can always mark it 0xfd later on. Ingo
[patch] block device stacking support, raid-2.3.47-B6
Heinz, Andrea, Linus, various ideas/patches regarding block device stacking support were floating around in the last couple of days, here is a patch against vanilla 2.3.47 that solves both RAID's and LVM's needs sufficiently: http://www.redhat.com/~mingo/raid-patches/raid-2.3.47-B6 (also attached) Andrea's patch from yesterday touches some of the issues but RAID has different needs wrt. -make_request(): - RAID1 and RAID5 needs truly recursive -make_request() stacking because the relationship between the request-bh and the IO-bh is not 1:1. In the case of RAID0/linear and LVM the mapping is 1:1, so no on-stack recursion is necessery. - re-grabbing the device queue in generic_make_request() is necessery, just think of RAID0+LVM stacking. - IO-errors have to be initiated in the layer that notices them. - i dont agree with moving the -make_request() function to be a per-major thing, in the (near) future i'd like to implement RAID personalities via several sub-queues of a single RAID-blockdevice, avoiding the current md_make_request internal step completely. - renaming -make_request_fn() to -logical_volume_fn is both misleading and unnecessery. i've added the good bits (i hope i found all of them) from Andrea's patch as well: the end_io() fix in md.c, the -make_request() change returning IO errors, and avoiding an unnecessery get_queue() in the fast path. the patch changes blkdev-make_request_fn() semantics, but these work pretty well both for RAID0, LVM RAID1/RAID5: (bh-b_dev, bh-b_blocknr) = just like today, never modified, this is the 'physical index' of the buffer-cache. internally any special -make_request() function is forbidden to access b_dev and b_blocknr too, b_rdev and b_rsector has to be used. ll_rw_block() correctly installs an identity mapping first, and all stacked devices just iterate one more step. bh-b_rdev: the 'current target device' bh-b_rsector: the 'current target sector' the return values of -make_request_fn(): ret == 0: dont continue iterating and dont submit IO ret 0: continue iterating ret 0: IO error (already handled by the layer which noticed it) we explicitly rely on ll_rw_blk getting the BH_Lock and not calling -make_request() on this bh more than once. with these semantics all the variations are possible, it's up to the device to use the one it likes best: - device resolves one mapping step and returns 1 (RAID0, LVM) - device calls generic_make_request() and return 1 (RAID1, RAID5) - device resolves recursion internally and returns 0 (future RAID0), returns 1 if recursion cannot be resolved internally. generic_make_request() returns 0 if it has submitted IO - thus generic_make_request() can also be used as a queue's -make_request_fn() function - it's completely symmetric. (not that anyone would want to do this) NOTE: a device might still resolve stacking internally, if it can. Eg. the next version of raid0.c will do a while loop internally if we map RAID0-RAID0. The performance advantage is obvious: no indirect function calls and no get_queue(). LVM could do the same as well. (the patch modifies lvm.c to reflect these new semantics, to not rely on b_dev and b_blocknr and to not call generic_make_request(), and fixes the lvm.c hack avoiding MD-LVM stacking. These changes are untested.) with this method it was pretty straightforward to add stacked RAID0 and linear device support, here is a sample RAID0+RAID0 = RAID0 stacking: [root@moon /root]# cat /proc/mdstat Personalities : [linear] [raid0] read_ahead 1024 sectors md2 : active raid0 mdb[1] mda[0] 1661472 blocks 4k chunks md1 : active raid0 sdf1[1] sde1[0] 830736 blocks 4k chunks md0 : active raid0 sdd1[1] sdc1[0] 830736 blocks 4k chunks unused devices: none [root@moon /root]# df /mnt Filesystem 1k-blocks Used Available Use% Mounted on /dev/md2 160747313 1524387 0% /mnt The LVM changes are not tested. The RAID0/linear changes compile/boot/work just fine and are reasonably well-tested and understood. any objections? Ingo --- linux/include/linux/raid/md_k.h.origWed Feb 23 06:00:20 2000 +++ linux/include/linux/raid/md_k.h Wed Feb 23 06:22:02 2000 @@ -75,6 +75,8 @@ extern inline mddev_t * kdev_to_mddev (kdev_t dev) { + if (MAJOR(dev) != MD_MAJOR) + BUG(); return mddev_map[MINOR(dev)].mddev; } @@ -213,7 +215,7 @@ char *name; int (*map)(mddev_t *mddev, kdev_t dev, kdev_t *rdev, unsigned long *rsector, unsigned long size); - int (*make_request)(mddev_t *mddev, int rw, struct buffer_head * bh); + int (*make_request)(request_queue_t *q, mddev_t *mddev, int rw, struct +buffer_head * bh); void
Re: [patch] block device stacking support, raid-2.3.47-B6
On Wed, 23 Feb 2000, Andrea Arcangeli wrote: - renaming -make_request_fn() to -logical_volume_fn is both misleading and unnecessery. Note that with my proposal it was make_request_fn to be misleading because all the code run within the callback had anything to do with the make_request code. ok, your variant was more like a -map_buffer_fn() thing - like the old md_map() stuff. -make_request_fn() is closer to 'make request' in the context of RAID1 and RAID5. (even if this is not visible now). - device resolves recursion internally and returns 0 (future RAID0), returns 1 if recursion cannot be resolved internally. I don't think it worth to handle such case if it costs something for the other cases. I'll check and test the code on the LVM side soon. the cost is only in the device (not in the generic block IO code), it's a 'if (MAJOR(bh-b_rdev) == MD_MAJOR) goto repeat;' type of thing (analogous in the LVM code), nothing more. We will see how common it gets - it's just a nice side-effect that th possibility is there. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Current raid driver for 2.3.42?
On Tue, 8 Feb 2000, Mike Panetta wrote: I am looking for an updated raid driver for kernel 2.3.42+ Does such a beast exist? I looked on Ingo's site and only found a patch for kernel 2.3.40. This patch did not patch cleanly at all. the newest RAID code is being merged into 2.3.43 right now. The RAID0 and linear changes, plus most of the md.c 'infrastructure' changes are already in pre5-2.3.43. The next step is RAID1 (including Mika Kuoppala's nice read balancing patch) and RAID4/5. WARNING: while this is a 'full merge' (ie. all the latest 0.90 stuff and more will show up), and RAID0/linear is pretty functional already, do not consider this to be near the reliability of 2.2+latest_raid, for a couple of weeks, at least. -- mingo
Re: Current raid driver for 2.3.42?
On Wed, 9 Feb 2000, James Manning wrote: [ Wednesday, February 9, 2000 ] Ingo Molnar wrote: the newest RAID code is being merged into 2.3.43 right now. (Hopefully) quick question. Will KNI work? i'll make sure it works (it certainly didnt in the past) - xor.c can afford full FPU saves so it needs no generic kernel support. -- mingo
Re: raid145 patches for 2.2.14 anywhere?
On Thu, 13 Jan 2000, Thomas Gebhardt wrote: just looked for the raid for 2.2.13 or 2.2.14 in the kernel archive. The last patches that I have found are for 2.2.11 and at least one hunk cannot be applied to the newer kernel sources without making the hands dirty. Can I get the patches for the newer kernels anywhere? it's at: http://www.redhat.com/~mingo/raid-2.2.14-B1 it applies cleanly to vanilla 2.2.14, do a 'patch -p0 raid-2.2.14-B1'. -- mingo
Re: [FAQ-answer] Re: soft RAID5 + journalled FS + power failure =problems ?
On Wed, 12 Jan 2000, Gadi Oxman wrote: As far as I know, we took care not to poke into the buffer cache to find clean buffers -- in raid5.c, the only code which does a find_buffer() is: yep, this is still the case. (Sorry Stephen, my bad.) We will have these problems once we try to eliminate the current copying overhead. Nevertheless there are bad (illegal) interactions between the RAID code and the buffer cache, i'm cleaning up this for 2.3 right now. Especially the reconstruction code is a rathole. Unfortunately blocking reconstruction if b_count == 0 is not acceptable because several filesystems (such as ext2fs) keep metadata caches around (eg. the block group descriptors in the ext2fs case) which have b_count == 1 for a longer time. If both power and a disk fails at once then we still might get local corruption for partially written RAID5 stripes. If either power or a disk fails, then the Linux RAID5 code is safe wrt. journalling, because it behaves like an ordinary disk. We are '100% journal-safe' if power fails during resync. We are also 100% journal-safe if power fails during reconstruction of failed disk or in degraded mode. the 2.3 buffer-cache enhancements i wrote ensure that 'cache snooping' and adding to the buffer-cache can be done safely by 'external' cache managers. I also added means to do atomic IO operations which in fact are several underlying IO operations - without the need of allocating a separate bh. The RAID code uses these facilities now. Ingo
Re: WARNING: raid for kernel 2.2.11 used with 2.2.14 panics
On Wed, 5 Jan 2000, Robert Dahlem wrote: I just wanted to warn everybody not to use raid0145-19990824-2.2.11 together with kernel 2.2.14: at least in my configuration (two IDE drives with RAID-1, root on /dev/mdx) the kernel panics with "B_FREE inserted into queues" at boot time. this should be fixed in: http://www.redhat.com/~mingo/raid-2.2.14-B1 let me know if you still have any problem. The problem outlined by Andrea's patch (which reverses a patch of mine) is solved as well. -- mingo
Re: raidtools for 2.3.36?
On Thu, 6 Jan 2000 [EMAIL PROTECTED] wrote: I am trying to build a raid0 array with 2 500 MB SCSI disks, using 2.3.36. 2.3.36 is broken wrt. RAID0 (even old RAID0 is broken). The new 2.3 RAID patch i'm working on for 2.3.36 still has some instabilities in RAID1, but RAID0 is rock solid. Will send a patch today or tomorrow, even if RAID1 is still instable, so that RAID0 (and the related ll_rw_blk.c and buffer.c changes) can be tested separately. -- mingo
Re: Help Raid for sparc
chunksize does have an important meaning in the linear case: it's 'rounding'. We cannot change this unilaterally (it breaks backwards compatibility), and it does make sense i believe. [certain disks serve requests faster which have proper alignment and size. I do not think we should assume that an arbitrarily misaligned IO request will perform identically.] So i'll fix raidtools to enforce chunksize in the linear case (maybe introduce a 'rounding' keyword?). Ingo On Fri, 26 Nov 1999, Jakub Jelinek wrote: On Fri, Nov 26, 1999 at 09:43:06AM +0100, [EMAIL PROTECTED] wrote: Hallo, I have a Sparc 10 with Linux6.1 running I have two disks of 1Gb and 1.7Gb. I would like to do a linear raid but when I do "raidstart -a /dev/md0 into shell I receive - /dev/md0: Invalid argument - and into consolle (read) sdb1's sb offset:1026048 [events: 20202020] md: invalid raid superblock magic on sdb1 md: sdb1 has invalid sb, not importing! could not import sdb1! autostart sdb1 failed! My kernel is 2.2.12-42 and raidtools-0.90 If I do mkraid /dev/md0 I receive into shell -handling MD device /dev/md0 analyzing super-block disk 0: /dev/sdb1, 1026144kB, raid superblock at 1026048kB disk 1: /dev/sdc1, 1720345kB, raid superblock at 1720256kB /dev/md0: Invalid argument and into consolle I receive some messages that they are into file attach messages. My /etc/raidtab is: raiddev /dev/md0 raid-level linear nr-raid-disks 2 nr-spare-disks 0 persistent-superblock 1 put chunk-size 8 here and redo mkraid (possibly with -f). It seems that the kernel is checking chunk size always, while raidtools are checking chunk size for raid0,1,4,5 only. IMHO kernel should not check chunk size for other raid levels, but if Ingo thinks it should, then raidtools should either error on not specified chunk-size for other levels as well or supply some default which will not trigger the md.c MD_BUG(). Cheers, Jakub ___ Jakub Jelinek | [EMAIL PROTECTED] | http://sunsite.mff.cuni.cz/~jj Linux version 2.3.18 on a sparc64 machine (1343.49 BogoMips) ___
Re: Problems with persistant superblocks and drive removal
i suspect this is what happened: md: md0, array needs 12 disks, has 7, aborting. raid0: disks are not ordered, aborting! raidstart was still using the old raidtab to start up the array. It has found an old array's superblock and tried to start it up. Some disks were not available so the raid0 module refused to run and aborted in a safe manner. the behavior you saw is normal (unless my analysis is wrong, i do not claim that there might not be bugs left). The only way we can guarantee protection against device reordering is marking RAID-enabled partitions as autostartable. For that to work on Sparc you'll have to introduce a new partition type (really small amount of hacking), only MSDOS partitions can currently be used as autostart. (because this is what i'm using) of course the worst thing that should happen with this are arrays not getting started up. If anything else happens (messed up superblocks or corrupted data) then that is a bug. -- mingo
Re: Problems with persistant superblocks and drive removal
On Mon, 18 Oct 1999, Florian Lohoff wrote: I created 2 Raid 5s with 6 Disks each. I created them one after another alwas disconnecting the other disks - Both raid 5s were created as /dev/md1 - Afterwards i duplicated the md1 entry and created an md2 attaching all 12 Disks. On startup the raidcode wasnt able to initialize both seperate raid5s. could you send me the raidtab(s) you used, the commands you used to create the array and the startup method, plus the bootlog of the failure? And which driver/patch, raidtools were you using. -- mingo
Re: 2.2.13pre15 SMP+IDE test summary
On Wed, 6 Oct 1999 [EMAIL PROTECTED] wrote: One more pre15 test: 2.2.13pre15 with Unified IDE 2.2.13pre14-19991003 (two rejects in ide.c, one ok, one probably harmless): (5) dual P3 machine: NULL deref after 6 hours (i.e. this pre15 kernel survived longest) is it correct that this failing kernel didnt have the RAID patch applied? I can think of these possible reasons for the SMP problems: (A) SMP race(s) in IDE driver in original 2.2.13pre15 (B) SMP-deadlock in raid-2.2.11-patch (B) is quite unlikely if you do not have it applied and the box still crashes? My understanding is that others who had IDE+SMP problems could reproduce it without RAID as well. (RAID0 stresses the hardware harder) -- mingo
Re: Hotswapping successes?
On Tue, 21 Sep 1999, Daniel Bidwell wrote: who has had success on hotswapping scsi devices in raid configuration? on which controllers and kernel versions? I am usinge a Compaq 2500 with 5 18GB disks, on Debian 2.1, Kernel 2.2.12 (with raid patches). We pulled a hotswap disk out and kept on using the disk system. A couple of error messages sputtered out on the console and it kept on working. We inserted a new unformated 18GB disk and did the raidhotremove/raidhotadd thing and it looked like it was rebuilding the raid. We rebooted the system and it came up without the new disk. I ran fdisk to format the new disk and did the raidhotadd thing and it is rebuilding the entire disk. It takes awile. It is still running. yes, you need to fdisk the new disk properly for autostart to work on the next startup. -- mingo
Re: [PATCH] adjustable raid1 balancing (was Re: Slower read accesson RAID-1 than regular partition)
On Fri, 17 Sep 1999, James Manning wrote: Since the previous sysctl code had been ripped out, this was pretty James, are you patching against the latest RAID source? 2.3.18 has a painfully outdated RAID driver. (i'm working on porting the newest stuff to 2.3 right now) simple, just pulling back in the code from 2.2.11-ac3. I'm hoping that the sysctl getting ripped out was more for acceptance, since speed-limit I still think was a good idea, even as a maximum, as it helped make the array more usable... sure, and it's present and used in the latest RAID driver ... maybe the fact that you are using the old driver explains why you see bad RAID1 performance? What performance do you see with the newest RAID driver on 2.2.12? -- mingo
Re: Kernel probs...
On Fri, 17 Sep 1999, David A. Cooley wrote: Running Kernel 2.2.11 with the raid patch and all is well... I'm wanting to upgrade to the 2.2.12 kernel just because it's newer... The 2.2.11 raid patch had some problems on the 2.2.12 source. Is there any benefit of the 2.2.12 kernel over the 2.2.11 (for sparc) or should I just stay with the 2.2.11 for now? 2.2.12 (and the pre-2.2.13 patches) are better than the 2.2.11 kernel. If you apply the 2.2.11 patch to 2.2.12 then you'll get a single reject, which you can safely ignore. (it tries to add something that has already been merged into the main tree) -- mingo
Re: RAID0 benchmark
On 31 Aug 1999, Marc SCHAEFER wrote: Now, I just changed to have the 4 disks on the QLOGIC 1080 (U2/LVD), then 4 (2 each for each aic7xxx) ---Sequential Output ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- MachineMB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 2000 19712 97.6 85378 85.8 30903 73.0 26272 97.1 83648 92.1 323.2 3.8 And with just the 4 disks on the QLOGIC: ---Sequential Output ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- MachineMB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 2000 19662 97.4 63567 64.3 21655 50.0 26064 96.0 68886 69.6 225.2 2.4 For me, this means that we are saturating. [...] i think you are hitting hardware limits. First, getting 85.3 MB/sec out of your RAID0 array isnt all that bad :) But CPU load seems to be pretty high, that could already be a limit. Also, DMA load is probably very high (and coming from several devices) as well. It would be interesting to check out the very same benchmarks with an identical but higher-clocked CPU, to see how much the saturation point depends on CPU speed. (this might not be possible with your system i guess) -- mingo
[oops] the limit is 27 disks! (Re: the 12 disk limit)
On Mon, 30 Aug 1999, D. Lance Robinson wrote: #define MD_SB_DESCRIPTOR_WORDS 32 #define MD_SB_DISKS 20 #define MD_SB_DISK_WORDS (MD_SB_DESCRIPTOR_WORDS * MD_SB_DISKS) oops. I've just re-checked the superblock layout calculations to prove you wrong, but actually it turned out that there was a factor of 2 error in all previous calculations! (done by several people, not only me) - the actual 'safe limit' for maximum number of disks is 27... I'll release a new RAID driver probably tomorrow (hopefully) with these fixes. It looks like the 'big' superblock changes can wait for some time, 27 isnt an all that bad limit after all ... this is a 100% safe solution - arrays bigger than 12 disks will not be backwards compatible, but no other problems. ugh, this is good news indeed for lots of people, thanks Lance for pointing this out... -- mingo
Re: Why RAID1 half-speed?
On Mon, 30 Aug 1999, Mike Black wrote: I just set up a mirror this weekend on an IDE RAID1 - two 5G disks on the same IDE bus (primary and master). /dev/hda: Timing buffer-cache reads: 64 MB in 0.95 seconds =67.37 MB/sec Timing buffered disk reads: 32 MB in 3.28 seconds = 9.76 MB/sec /dev/md0: Timing buffer-cache reads: 64 MB in 0.85 seconds =75.29 MB/sec Timing buffered disk reads: 32 MB in 6.10 seconds = 5.25 MB/sec could you also show the RAID0 results? That one should quite accurately show the effect of master/slave interaction. Some IDE chipsets do not handle it high-performance at all. -- mingo
Re: the 12 disk limit
On Mon, 30 Aug 1999, Lawrence Dickson wrote: I guess this has been asked before, but - when will the RAID code get past the 12 disk limit? We'd even be willing to use a variant - our customer wants 18 disk RAID-5 real bad. yes, this has been requested before. I'm now mainly working on the 2.3/2.4 merge, it's working but it has unearthed main kernel bugs. I have patches for up to ~250 disks per array, but these patches are not proven and it's a major change in the superblock layout. until these 'big RAID' patches are merged, as a workaround i suggest you to combine two (or more) RAID5 arrays with RAID0 or LINEAR to form a bigger array. Data safety is not compromized with this, system administration is a bit more complex. (also you cannot get a better than 1:12 parity/data ratio) On the bright side, this setup provides you more protection than pure 1:18 RAID5. (eg. if you do it 2x 1:9, then the two RAID5 arrays can fail at once, there is a 50% chance that simultaneous 2-disk failures will be covered by this.) -- mingo
Re: AW: AW: more than 16 /dev/sdx ??
On Thu, 22 Jul 1999, Schackel, Fa. Integrata, ZRZ DA wrote: Thx for all your help. All works fine. I had to rebuild kernel and reboot with the new one. the 'hard limit' for per-array disks is 12. Work is underway to raise this limit. (the new code boots and works, but some migration issues have to be taken care of as the new superblock layout is incompatible) -- mingo
Re: RAID 0+1
On Thu, 22 Jul 1999, Christopher A. Gantz wrote: Also was wondering what was the status of providing RAID 1 + 0 functionality in software for Linux. it works just fine: [root@moon /root]# cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid5] read_ahead 1024 sectors md2 : active raid1 md1[1] md0[0] 8467136 blocks [2/2] [UU] md1 : active raid0 sdf[1] sde[0] 8467200 blocks 8k chunks md0 : active raid0 sdd[1] sdb[0] 8467200 blocks 8k chunks [root@moon /root]# df /mnt Filesystem 1k-blocks Used Available Use% Mounted on /dev/md2 820141652 7778008 0% /mnt you can mirror RAID0 arrays no problem. -- mingo
Re: Raid and SMP, wont reboot after crash
On Tue, 13 Jul 1999, Michael McLagan wrote: Basic config: Supermicro P6DGU, dual Pent III 500MHz, 512M RAM, AIC7890 chipset, dual Seagate 39140W 9.1G Metalist drives. This fails on multiple machines of the same config, so it's not a machine specific hardware problem. Using 2.2.7 (what I've got), raid0145-19990421 and a / RAID1 partition. It do you get the same problems with 2.2.10 + raid0145-19990713? -- mingo
RELEASE: RAID-0,1,4,5 patch 1999.07.13 for 2.2.10 and 2.0.37
i have released Linux-RAID 1999.07.13, you can find the patches raid0145-19990713-2.0.37.gz, raid0145-19990713-2.2.10.gz and raidtools-19990713-0.90.tar.gz in the usual alpha directory: http://www.country.kernel.org/pub/linux/daemons/raid/alpha [mirrors should have synced up by the time you receive this email] the patch adds the following new features: - the failed-disk patch from Martin Bene [EMAIL PROTECTED]. With this feature it's possible to recreate arbitrary RAID superblock state. It's also used to install RAID1-only systems easier. - new super-fast PIII/KNI RAID-checksumming assembly routines from Zach Brown [EMAIL PROTECTED] and Doug Ledford [EMAIL PROTECTED]. These new checksumming routines not only speed RAID5 writes and reconstruction up significantly on PIII boxes, but also use the new KNI cache-control instructions to reduce cache footprint. - initial SMP threading of RAID5 - RAID5 will be fully SMP threaded by 2.4 time. bugfixes in this release (only the resync bug was serious): - some annoying documentation and sample config file errors got corrected - raidtools abort message updated - the 'hanging resync' bug, triggered by 2.2.8 fixed - a slightly rewritten version of the small raid0.c additional sanity check patch floating around on linux-raid - we'll see wether it makes a difference. - *WARNING*. The 2.0.37 RAID-patch includes the 'EGCS patch', so if you have applied the EGCS patch you'll get rejects. ... and other small stuff. Let me know if i've missed something. While these changes might sound extensive, they are well-tested, thanks to Mike Black and others. enjoy, -- mingo
Re: RAID and Queuing problem
On Tue, 13 Jul 1999, jiang wrote: I 'd like to know how the queueing commands are organized in a RAID system where multi-host and multi-LUN are simultanously supported. Are all the queuing commands are threaded or only threaded on a LUN basis? Thanks I'm not sure i understand your question. If it's about restrictions wrt. the execution of IO commentds, then the answer is that there is no restriction in the RAID architecture per se - SCSI commands will be executed in arbitrary order by the block device and SCSI layer - depending on various reordering optimizations. The RAID layer itself does it's own optimizations as well. Or is your question about SMP-threading? -- mingo
Re: Swap on Raid ???
On Mon, 12 Jul 1999 [EMAIL PROTECTED] wrote: The HOWTO states that swapping on RAID is unsafe, and that is probably unjustified with the latest RAID patches. yes swapping is safe. It's _slightly_ justified with RAID1 to be fair - but i've tried it myself and was unable to reproduce anything bad. Linux handles resource starvation much better these days. -- mingo
Re: Newbie: Quick patch question
On Mon, 12 Jul 1999, Solitude wrote: The reason for the question: I want to build a production box with a root raid level 1. I have this kinda sorta working on a test box right now. I have not patched the kernel at all. I compiled it myself to support initrd, but otherwise it is a stock kernel. I built it from the redhat-6.0 distribution 2.2.5-15 kernel source. I am wondering if the patches are something I need to look into or not. Red Hat 6.0 includes a very recent (and well-tested) version of the RAID code, so you need no extra patching. If you want to use kernel 2.2.10 (+) later, then you'll need to fetch the latest RAID patch from linux.kernel.org. -- mingo
RE: resync runs forever
#if 0 if ((blocksize/1024)*j/((jiffies-starttime)/HZ + 1) + 1 sysctl_speed_limit) { current-priority = 0; ^^ this is the real bug, it should be: current-priority = 1; yeah, stupid bug. You dont have to comment out the whole speed limit stuff (it's rather useful you'll notice). -- mingo
Re: RAID0 and RedHat 6.0
On Mon, 17 May 1999, Robert McPeak wrote: Here are the relevant messages from dmesg: hdd1's event counter: 000c hdb1's event counter: 000c request_module[md-personality-2]: Root fs not mounted do_md_run() returned -22 hm, this is the problem, it tries to load the RAID personality module but cannot find it, because the root fs is not yet mounted. But 'md-personality-2' is strange as well, it should be 'md-personality-0' for RAID0, there is no personality-2 ... when you run it manually: raid0 personality registered then it correctly registers raid0. You'll definitely get rid of these problems if you compile RAID into the kernel (this is only a workaround), but these things supposed to work. I'm not sure yet whats going on. -- mingo
Re: RAID and RedHat 6.0
On Sun, 9 May 1999, Charles Barrasso wrote: I recently upgraded one of my computers to RedHat 6.0 (which includes raid .90). Before the upgrade I had 2 4.1GB SCSI Hdd's combined into a linear RAID array (created with raidtools-0.50beta10-2) .. after the upgrade I went to re-instate this array and put the following into my /etc/raidtab: raiddev /dev/md0 raid-level linear nr-raid-disks 2 device /dev/sdb1 raid-disk 0 device /dev/sdc1 raid-disk 1 but.. when I run raidstart -a I get: [root@news /root]# /sbin/raidstart -a /dev/md0: Invalid argument this is the correct raidtab entry for your config: raiddev /dev/md0 raid-level linear nr-raid-disks 2 persistent-superblock 0 chunk-size 8 device /dev/sdb1 raid-disk 0 device /dev/sdc1 raid-disk 1 to get your array running simply do: raid0run /dev/md0 and thats all. Note that to get 'raid0run' you'll have to get and install the latest raidtools. (RedHat 6.0 includes the latest code but raid0run was added shortly after RH 6.0 was released, raid0run will show up in an errata) let me know if it still doesnt work, -- mingo
Re: Raid0 created with old mdtools
On Thu, 29 Apr 1999, Tuomo Pyhala wrote: I upgraded RH6.0 to one machine having raid0 created with some old version of mdtools. However new code seems to be unable to start it complaining about superblock magic. Has the superblock bee nchanged/Added in newer versions making them incompatible with old versions or is there some option i can use to get the raid0 running and mounted? please upgrade to raidtools-19990421-0.90.tar.gz, that raidtools version handles the RH 6.0 kernel just fine. First create the correct /etc/raidtab. Then use 'raid0run /dev/md0' or 'raid0run -a' in your init scripts to start up the old array. -- mingo
Re: auto-partiton new blank hotadded disk
On Mon, 26 Apr 1999, Benno Senoner wrote: I am interested more in the idea of automatically repartition a new blank disk while it is hot-added. no need to do this in the kernel (or even in raidtools). I use such scripts to 'mass-create' partitioned disks: [root@moon root]# cat dobigsd if [ "$#" -ne "1" ]; then echo 'sample usage: dobigsd sda' exit -1 fi echo "*** DESTROYING /dev/$1 in 5 seconds!!! ***" sleep 5 dd if=/dev/zero of=/dev/$1 bs=1024k count=1 (for N in `cat domanydisks`; do echo $N; done) | fdisk /dev/$1 [root@moon root]# cat domanydisks n e 1 1 200 n l 1 25 n l 26 50 n l 51 75 n l 76 100 n l 101 125 n l 126 150 n l 151 175 n l 176 200 n p 2 300 350 n p 3 350 400 n p 4 450 500 t 2 86 t 3 83 t 4 83 t 5 83 t 6 83 t 7 83 t 8 83 t 9 83 t 10 83 t 11 83 t 12 83 w thats all, fdisk is happy to be put into scripts. -- mingo
Re: auto-partiton new blank hotadded disk
On Mon, 26 Apr 1999, Benno Senoner wrote: no need to do this in the kernel (or even in raidtools). I use such scripts to 'mass-create' partitioned disks: but it's not unsafe to overwrite the partition-table of disks which are actually part of a soft-raid array and in use ? it's unsafe, and thus the kernel does not allow it at all. Why dont you create the partitions before hot-adding the disk? -- mingo
Re: A couple of... pearls?
On Sat, 24 Apr 1999, Andy Poling wrote: I agree completely with the first statement. But the second sounds somewhat odd to me. I can hotadd or hotremove a disk on linux with sw RAID and a non-hot swappable capable controller, maybe this is another feature of sw RAID over hw RAID? Because you're _supposed_ to quiet the SCSI bus while you'ure swapping your disk to prevent errors in active requests when you're removing or inserting a device into the bus. we could as well provide kernel functionality to turn a particular SCSI bus (from within the kernel) off/on by delaying IO requests. This has to be done carefully to avoid deadlocks (what if the code to turn the bus on lies on a disk on that bus :), but can be done i think, without hardware assistance. -- mingo
Re: Global hot-spare disk?
On Sun, 25 Apr 1999, Steve Costaras wrote: I'm playing this weekend with v2.2.6 the new patches on a spare server trying to get boot-raid working or to see how far off it is. Anyway, I noticed that the current code doesn't seems to allow a 'global hot spare' disk for the raid arrays. On my test system here I have 3 arrays (raid 1 raid 5) and instead of keeping one hot standby disk for each array I'd like to keep one disk (obviously large enough to accomidate any of the arrays) as a hot spare to be used in any array that needs it. good idea, i've added this to the TODO list. Until this is implemented you can keep it 'quasi-global' by raidhotremoving it, and adding it only if a disk fails. Probably a small script (to watch failures) is needed for this too. -- mingo
Re: Re-naming raid arrays?
On Thu, 22 Apr 1999, Steve Costaras wrote: I have a raid array /dev/md0 on a system here. I am now looking at moving some things around and want to rename this to say /dev/md9 or whatever. Since this data is mapped (initially) out of the /etc/raidtab file and then stored in the raid superblock, is there a way to update this WITHOUT loosing data on the device? Ie, I'm keeping all disks/partitions the same that make up the device I just want to change it's offset (to free up 900 to create a boot-raid device). Is this possible, and if so, how? there is no 'safe' way yet to change the number of a persistent-superblock RAID array. You can do it by taking the array down (stop it), and recreate with a modified /etc/raidtab. BUT! doing this you lose all protection against device mixups (for this one single mkraid that is), so you have to be very careful to have the right partitions mentioned. Especially on RAID1 and RAID5 the reconstruction starts immediately overwriting what it thinks to be redundant data ... but yes, this works. Alternatively there could be a safe 'tuneraid' utility to do various smaller changes to an array. (one such functionality would be to change the number of the array, another one could be to make an array's config 'immutable') -- mingo
Re: New patches against v2.2.6 kernel?
On Sat, 17 Apr 1999, Steve Costaras wrote: Has anyone created any patches against the new 2.2.6 kernel, the latest I've seen is against v2.2.3 which doesn't apply cleanly against the newer kernels. i'll release it Real Soon. (probably this weekend) Also, Just a side question, what's the status in possibly merging the raid code into the current kernel? I for one have been running raid 5 here on several systems for about a year with no problems. Or am I just lucky? yep, the upcoming release will be a canditate for a merge. -- mingo
Re: Swap on raid
On Wed, 14 Apr 1999 [EMAIL PROTECTED] wrote: Hi folks, we are trying to set up a mirrored (raid-1) system for reliability but it is not possible according to the latest HOWTO to swap onto a raid volume. Is there any change on this? it does work for me (i do not actually use it as such, but i've done some stresstesting under heavy load). Let me know if you find any problems. -- mingo Has anyone set up a system like this with/without swap configured and what is your experience? We have decided to disable swap for the moment, does anyone know if this causes any problems with the general usage of linux. If we enable swap then the system will almost certainly crash if the disk with the swap partition crashes which would make mirroring the file systems a waste of time. Please correct me if I am wrong. Presumably swapping to a file has the same problems. If swap does not work on a mirrored volume are there plans to make it work in the future? If not does anyone know what it would take. Perhaps we can help. Brian Murphy
Re: Swap on raid
On 14 Apr 1999, Osma Ahvenlampi wrote: Ingo Molnar [EMAIL PROTECTED] writes: it does work for me (i do not actually use it as such, but i've done some stresstesting under heavy load). Let me know if you find any problems. Hmm? Since when does swapping work on raid-1? How about raid-5? i've tested it on RAID5, swapping madly to a RAID5 array while parity is being reconstructed works just fine. -- mingo
Re: lockup with root raid1 linux 2.2.1
On Tue, 23 Mar 1999, Thorsten Schwander wrote: System: dual pentium 450, linux 2.2.1 SMP, BusLogic BT-958, two 4.5 GB SCSI ^^^ disks with root RAID1, raid0145 and raidtools from mid February buslogic scsi driver compiled into the kernel Symptoms: solid freeze, no response to ALT SysRq, ping, etc. A web server is running on the machine, otherwise it was more or less idle at 2.2.1 has a couple of known problems which are fixed in 2.2.3. (or better, try 2.2.4-pre6). Is this a sign of hardware problems or could there be a problem with software raid? Is there anything I could do to help debugging? next time please do a 'Ctrl-ScrollLock' and check out the process list, which ones are running. Let me know if it happens again with a newer kernel and RAID version 1990309. -- mingo
Re: persistent-superblock 0 makes raidstart fail.
On Fri, 19 Mar 1999, Piete Brooks wrote: Should raidtools-19990309-0.90 manage a linear device without a SB ? [ I can "mkraid" it, but once stopped, it can never be restarted ] md8 fails, md7 is fine. Since it's nonpersistent, it can only be re-created. The 'old' mdadd+mdrun was always re-creating arrays as well. raidstart (and autostart) starts only 'persistent' arrays. -- mingo
Re: RAID1 experiences - patches
just to add one more point, i was waiting for 2.2 to stabilize before moving the RAID driver to 2.2.x. But when patches began floating around porting the RAID driver to 2.2.x, i rather decided to move the 'official' patch to 2.2.x too. This resulted in at least two bogus 'RAID-problems' so far: the out of memory thing is a generic kernel bug fixed in pre2-2.2.2, the 'crash when MMX' problem seems to be bogus as well. So the seemingly increasing number of bugs is actually mostly a side-effect of 2.2 stabilization. The difference in user-space tools between the latest 'stable' RAID package and the current version is unfortunate but unavoidable. Still waiting for someone with better documentation skills than mine to pick the maintainance of RAID-docs up :) -- mingo
Re: LOTS OF BAD STUFF in raid0: raid0145-19990824-2.2.11 is unstable
On Fri, 5 Nov 1999, David Mansfield wrote: Well, I've never gotten a single SCSI error from the controller... not to mention that the block being requested is WAY beyond the end of the device. If this wasn't a RAID device, this would be one of the 'Attempt to access beyond end of device' errors that non-raid users have reported many times for the 2.2 series kernels. I have also gotten the error when not under any load, about once a month or so, but never with the alarming frequency of last night! it's 99.99% a problem with the disk. The RAID0 code has not had any significant changes (due to it's simplicity) in the last couple of years. We never rule out software bugs, but this is one of those cases where it's way, way down in the list of potential problem sources. -- mingo