Re: FAQ update
[Luca Berra] > >The patches for 2.2.14 and later kernels are at > >http://people.redhat.com/mingo/raid-patches/. Use the right patch for > >your kernel, these patches haven't worked on other kernel revisions > >yet. > > i'd add: dont use netscape to fetch patches from mingo's site, it hurts > use lynx/wget/curl/lftp Yes, *please* *please* *please* -- James Manning <[EMAIL PROTECTED]> GPG Key fingerprint = B913 2FBD 14A9 CE18 B2B7 9C8E A0BF B026 EEBB F6E4
Re: FAQ
[Luca Berra] > from the info page from gnu tar 1.13.17: > > `--bzip2' > `-I' > This option tells `tar' to read or write archives through `bzip2'. As mentioned previously, this is a distro-specific hack. I have it in my tar as well, but trusting it to be part of core GNU tar just because it works on your system is silly. version 1.13 is the latest at ftp://ftp.gnu.org/pub/gnu/tar/ and specifically mentions the bzip2 situation in its NEWS file: +++ * An interim GNU tar alpha had new --bzip2 and --ending-file options, but they have been removed to maintain compatibility with paxutils. Please try --use=bzip2 instead of --bzip2. +++ Checking the ChangeLog shows bzip2 support added 1999-02-01 (in the form of -y, --bzip2, and --bunzip2) and then removed 1999-06-16 In any case, it certainly is true that we can trust -z to be around on any standard Linux install, and as such it is the correct answer to this thread. -- James Manning <[EMAIL PROTECTED]> GPG Key fingerprint = B913 2FBD 14A9 CE18 B2B7 9C8E A0BF B026 EEBB F6E4
Re: FAQ
[Marc Mutz] > >2.4. How do I apply the patch to a kernel that I just downloaded from > >ftp.kernel.org? > > > >Put the downloaded kernel in /usr/src. Change to this directory, and > >move any directory called linux to something else. Then, type tar > >-Ixvf kernel-2.2.16.tar.bz2, replacing kernel-2.2.16.tar.bz2 with your > >kernel. Then cd to /usr/src/linux, and run patch -p1 < raid-2.2.16-A0. > >Then compile the kernel as usual. > > Your tar is too customized to be in a FAQ. there is no bzip2 standard in gnu tar, so let's be intelligent and avoid the issue by going with the .gz tarball as a recommendation. -z is standard. Also, none of the tarballs will start with "kernel-" but "linux-" anyway, so that needs fixing. Also, I'd add "/path/to/" before the raid in the patch command, since otherwise we'd need to tell them to move the patch over to that directory (pedantic, yes, but still) oh, and "move any directory called linux to something else" seems to miss the possibility of a symlink, where renaming the symlink would be kind of pointless. Whether tar would just kill the symlink at extract time anyway is worth a check. -- James Manning <[EMAIL PROTECTED]> GPG Key fingerprint = B913 2FBD 14A9 CE18 B2B7 9C8E A0BF B026 EEBB F6E4
Re: OT: best cross-OS filesystem
[Edward Schernau] > Sorry to waste bandwidth, but I'm looking at a way for better > cross-OS performance on my "shared" partition - are there ext2fs > drivers for NT somewhere, or maybe hpfs drivers for NT? I have some > very large directories with 100's of files, and I want to be able to > get in and around them easily... FAT32 appears to be the dominate cross-OS filesystem of choice, combining long-filename support with native read-write capability in Linux, 95/98, NT/2000 James -- James Manning <[EMAIL PROTECTED]> GPG Key fingerprint = B913 2FBD 14A9 CE18 B2B7 9C8E A0BF B026 EEBB F6E4
Re: Determining a failed device
[Kirk Patton] > The status should be: > md0 : active raid5 sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1] sda1[0] > 71681024 blocks level 5, 256k chunk, algorithm 0 [5/5] [U] 5 active, 1 standby (6 raid disks total) > The status is: > md0 : active raid5 sdf1[4] sde1[4](F) sdd1[3] sdc1[2] sdb1[1] sda1[0] > 71681024 blocks level 5, 256k chunk, algorithm 0 [5/5] [U] 5 active, 1 failed (6 total). This is a snapshot after the rebuild has already occurred (or the drive that failed was the spare, but that's unlikely given typical ordering conventions) > I noted the (F) by sde1. Does this stand for > failed? Is there any references to the types of > errors that will be reported in the syslog or > /proc/mdstat? yes, F is failed. > Personalities : [raid5] > read_ahead 1024 sectors > md0 : active raid5 sdg1[6] sdf1[5] sde1[4] sdd1[3] > sdc1[2] sdb1[1](F) sda1[0] 106653696 blocks level > 5, 256k chunk, algorithm 0 [7/6] [U_U] > unused devices: > > Reading this status from /proc/mdstat, I am > thinking that the raid is running in degraded mode > with "sdb1" as the failed drive. The [7/6], does > that mean that there are 7 devices and only 6 are > currently running? yup, that's degraded. You'll want to raidhotremove the sdb1 and raidhotadd a new partition (possibly sdb1 after that drive gets replaced, depending on your controller and other factors) and it'll rebuild onto the new drive. James -- James Manning <[EMAIL PROTECTED]> GPG Key fingerprint = B913 2FBD 14A9 CE18 B2B7 9C8E A0BF B026 EEBB F6E4
Re: 2.4.0 autodetect patch
[Nick Kay] > Better still would be a pointer to the linux-raid archives - I can't > find them even if they do exist. It still cracks me up that "linux-raid archive" into google returns such a long list yet ppl swear that can't find them. Interesting. Anyway, everyone's got their fav, but mine is: http://www.mail-archive.com/linux-raid@vger.rutgers.edu/ -- James Manning <[EMAIL PROTECTED]> GPG Key fingerprint = B913 2FBD 14A9 CE18 B2B7 9C8E A0BF B026 EEBB F6E4
Re: raid newbe!
[Fredrik Lindström] > I've been searching for a RAID howto or something like that > What I'm after is the software raid in linux Go to http://www.linuxdoc.org Under HOWTOs, look for "Software RAID" http://www.linuxdoc.org/HOWTO/Software-RAID-HOWTO.html James -- James Manning <[EMAIL PROTECTED]> GPG Key fingerprint = B913 2FBD 14A9 CE18 B2B7 9C8E A0BF B026 EEBB F6E4
Re: DPT PM3334
This is what I get for not having coffee before reading my email. > > I have been trying to get Red Hat 6.2 to install > > on my DPT PM3334 raid controller, but > > I just read somewhere that Red Hat does not > > support installing the boot partition > > onto the raid array. In all cases I've seen, any device that there is a module to support can have that module shoved into a initrd just fine. Any good raid controller only shows a logical disk to the OS anyway, so usu. hardware raid situations are much easier for booting off of raid than s/w ones. James, who *still* needs to go downstairs and get some C8-H10-N4-O2 -- James Manning <[EMAIL PROTECTED]> GPG Key fingerprint = B913 2FBD 14A9 CE18 B2B7 9C8E A0BF B026 EEBB F6E4
Re: DPT PM3334
[Souvigna Phrakonkham] > Hello, has anyone put a boot partition on the > "raid" array drives? If so which > distribution of linux? You can make it work on any distro, but afaik the only installer that currently has "native support" is RH 6.2's gui installer. > I have been trying to get Red Hat 6.2 to install > on my DPT PM3334 raid controller, but > I just read somewhere that Red Hat does not > support installing the boot partition > onto the raid array. It does, for RAID-1 (mirroring). (no striping involved, and their lilo has been patched to support the raid-1 device). Using Disk Druid, it's pretty straightforward to make a couple of partitions and "make raid device", so you should be ok. The steps to reverse a bootable raid onto an existing system are a bit tedious, but covered in the Boot+Root RAID Howto and Software RAID Howto, both at linuxdoc.org James -- James Manning <[EMAIL PROTECTED]> GPG Key fingerprint = B913 2FBD 14A9 CE18 B2B7 9C8E A0BF B026 EEBB F6E4
Re: Raid1: How to verify that mirroring is functioning
[root] > Hi, Hello. > I've created mirrored striped arrays (Raid10) and am not confident that > my first striped set is in fact being mirrored on my second striped set. First question: did you make backups? :) > When the mirrored mdX devices are created, cat /proc/mdstat does show > that re-synching is taking place. However, if I mount an mdX that is > part of my second striped set, I see NO files, just a lost+found > directory. Hmm, I didn't mount as read-only. It this significant? Any chance we could see your /proc/mdstat output? > What techniques can I use to verify that the second striped set is being > mirrored? Is there a raidtool to force resynching? mkraid'ing md10-14 will need to write to the ends of md0-9, possibly corrupting the filesystems already in place (with the blessed data being on md0-4, it would appear). Although it's not broken out as a separate section, the method for getting a mirror made of already in-place data isn't extremely nice, but it has been effective for many in the past. It's covered as "Method 2" at: http://www.linuxdoc.org/HOWTO/Software-RAID-HOWTO-4.html#ss4.12 If you have an ext2 resizer that you trust to shrink the fs enough for the raid superblock, you can try that and avoid the step of copying over data manually. Not recommended, of course, but it's a possibility. > If, perchance, an mdX on the first-striped set has a problem, will the > mirrored device kick in and re-synch the striped mdX with the problem? > When this happens (as I'm sure it probably will at some point), how will > I know that it is occurring? I am guessing that the first striped set > will be out of operation until it is repaired by re-synching with the > mirrored set. > > How can mirroring be effectively used & monitored? The major problem here is that once you create (via the failed-disk method) the raid10, you *need* to start mounting the md10-14 devices. Manually dealing with the underlying md0-9 devices isn't supported after that point. It boils down to the fact that raid1 is "write to md10, mirror the writes across md0 and md5" and not "the raid1 module should catch all writes to md0 and automatically mirror them to md5". You have to use the raid1 mdX device you created or you best-case lose raid1 functionality, worst-case lose data. > fstab file: > > /dev/md1/local ext2defaults 1 2 > /dev/md0/optext2defaults 1 2 > /dev/md4/tmpext2defaults 1 2 > /dev/md2/usrext2defaults 1 2 > /dev/md3/varext2defaults 1 2 After the "method 2" (failed-disk) steps to get the mirrored/striped raid10's up and running, you'll need to change these by "adding 10" to each (md11, md10, md14, md12, md13) so you're using the raid10 devices and not an underlying raid0 device. HTH, HAND James -- James Manning <[EMAIL PROTECTED]> GPG Key fingerprint = B913 2FBD 14A9 CE18 B2B7 9C8E A0BF B026 EEBB F6E4 PGP signature
Re: general question.
[Roman Seibel] > comp:~/ # mkraid /etc/raidtab > mkraid version 0.36.4 http://www.linuxdoc.org/HOWTO/Software-RAID-HOWTO.html Specifically, the "requirements" section 1.2 http://www.linuxdoc.org/HOWTO/Software-RAID-HOWTO-1.html#ss1.2 HTH, James -- James Manning <[EMAIL PROTECTED]> GPG Key fingerprint = B913 2FBD 14A9 CE18 B2B7 9C8E A0BF B026 EEBB F6E4 PGP signature
Re: 2.2.16, "device too small (0 kB)"
[Marc Haber] > I am trying to build a RAID 1 with two disks on a new system. Linux is > Debian potato, kernel 2.2.16 patched with raid-2.2.16-A0, raidtools > built from raidtools-dangerous-0.90.2116.tar.gz. So far so good. > | Device BootStart EndBlocks Id System > |/dev/hda738 2501 19792048+ fd Linux raid autodetect > > | Device BootStart EndBlocks Id System > |/dev/hdb738 2501 19792048+ fd Linux raid autodetect Looks fine > |haber@gwen[7/58]:~$ cat /etc/raidtab > |raiddev /dev/md0 > |raid-level 1 > |nr-raid-disks 2 > |nr-spare-disks 0 > |chunk-size 4 > |persistent-superblock 1 > |device /dev/hda7 > |raid-disk 0 > |device /dev/hdb7 > |raid-disk 1 Also good. > However, when I finally try to build the RAID, this is what happens: > |haber@gwen[8/59]:~$ sudo mkraid /dev/md0 > |handling MD device /dev/md0 > |analyzing super-block > |/dev/hda7: device too small (0kB) > |mkraid: aborted, see the syslog and /proc/mdstat for potential clues. > |haber@gwen[9/60]:~$ cat /proc/mdstat > |Personalities : > |read_ahead not set > |unused devices: > |haber@gwen[10/61]:~$ > > Nothing is written to syslog. Being a non-primary partition shouldn't be a problem (there was the autodetection issue iirc, but that shouldn't matter here) The only time I've been device too small was when I was accessing a device that didn't have a proper /dev entry. the fdisk -l probably only needed /dev/hda to be valid, but for the mkraid to succeed /dev/hda7 will need to be valid (3,7). Not likely, but that's the only time I saw it. -- James Manning <[EMAIL PROTECTED]> GPG Key fingerprint = B913 2FBD 14A9 CE18 B2B7 9C8E A0BF B026 EEBB F6E4 PGP signature
Re: Packages needed
[Micah Anderson] > According to the RAID HOWTO > (www.linuxdoc.org/HOWTO/Root-RAID-HOWTO-2.html) you are supposed to have > the following packages: [snip] > So, is this HOWTO not useful to me? If that is true - I haven't been able > to find a HOWTO elsewhere that addresses the ".90 raidtools and > accompanying kernel patch to the ...2.2x...series kernels". http://www.linuxdoc.org/HOWTO/Software-RAID-HOWTO.html It would be nice if the Root-RAID-HOWTO desc. included a link -- James Manning <[EMAIL PROTECTED]> GPG Key fingerprint = B913 2FBD 14A9 CE18 B2B7 9C8E A0BF B026 EEBB F6E4 PGP signature
Re: where is the archive?
[Sandro Dentella] >i wanted to browse the mailing -list archive before bothering jou w/ my >problems but I coudn't find any: where are they? http://www.mail-archive.com/linux-raid@vger.rutgers.edu/ Also, read the HOWTO http://www.linuxdoc.org/HOWTO/Software-RAID-HOWTO.html -- James Manning <[EMAIL PROTECTED]> GPG Key fingerprint = B913 2FBD 14A9 CE18 B2B7 9C8E A0BF B026 EEBB F6E4
Re: performance statistics for RAID?
[Gregory Leblanc] > Is there any chance of keeping track of these with software RAID? AFAIK, sct's patch to give sar-like data out of /proc/partitions gives all of the above stats and more... neat patch :) The user-space tool should be in the same dir. And, FWIW, I get asked about how people can get a "sar" for Linux *very* often by the SCO people here at work. James
Re: Reiser to the occasion
[Henry J. Cobb] > but if you've got a journaling filesystem, wouldn't you want to expose the > raw disks to it so it can choose to put the journals on different disks > than the files? Funny, since sct/ext3 is the only one that appears to be pushing to keep alive the possibility of journaling to other devices (nvram for one, which is definitely a good idea). In one sense, creating an external dependency for the recovering of your data can be a Bad Thing. > This would not only help with performance, but it would also make recovery > as simple as using one of the surviving journal copies and applying that > against the last full backup of the main file system. (I.e. you lose 10 > disks out of your 12 disk "array" and wind up not losing a single byte of > data.) journals aren't *nearly* that deep. journal transaction entries can get overwritten (circular buffer) as soon as the full transaction has been committed to disk. It does *not* keep all transactions around since your last full backup (how would it even know? :) Journaling != RAID != LVM != Backups. They all serve their own purpose, and invariably trying to use one to cover the tasks of others *will* bite you eventually (as we have seen on this list multiple times) James
Re: Easy way to convert RAID5 to RAID0?
[[EMAIL PROTECTED]] > Yes, I know that. Unfortunately, I'm working on an extremely > insert-heavy application (over 100 million records per day). I would > really like ReiserFS (due to the large file size as well as for the > journaling). I don't see how RAID5 can meet my needs. FWIW, ReiserFS won't get you much unless there are large numbers of files involved. I run s/w raid0 over h/w raid5 with ext2 specifically because it's faster for my situation with relatively low file counts (about 100 files per directory). James
Re: Easy way to convert RAID5 to RAID0?
[[EMAIL PROTECTED]] > I find that my RAID5 array is just too slow for my DB application. I > have a large number of DB files on this array. I would like to > convert to RAID0, and I can back up my files, but I was wondering if > there is a way to convert without reformatting? Not currently, although it may be worth reconsidering a conversion from 5 -> 0 if you can alleviate your performance problems with other methods (chunk size, -R stride=, reiserfs, more memory, etc) Just a thought, although for anything OLTP-ish you're going to be so insert- and update-heavy that I'm sure raid5's going to be less than ideal for some performance requirements... Keep in mind that you won't be able to survive through a disk failure like you can now, though (I know you already know this, just want to rehash :) James
Re: raid 0 problems after kernel upgrade
[blair christensen] > hello, > rh 6.2 on a dell poweredge 4400 box. it was running 2.2.14-5 with a > raid 0 array. i upgraded the kernel to 2.2.16 and i am now having > problems with the raid device (/dev/md0). You didn't patch your 2.2.16 (www.redhat.com/~mingo/raid-patches) > when i try to mount the device, i get: check /proc/mdstat before trying mount's or tune2fs or other things. It should show that you don't have an active md0, so subsequent attempts to use md0 will certainly fail. HTH, HAND James
Re: Raid 5. Lost 2 drives.
[m.allan noah] > > The howto says try mkraid --force. With a 2 drive (2/4) will I lose > > everything. > > why do you want to make a two drive raid5? that makes no sense. use raid1. If you *read* his message you'll notice that he has 4 drives in the array and lost 2 of them (2 still active). :) > yes- if there is data already on the drive, running mkraid is a pretty sure > way to destroy the filesystem, since part of the file system will be > overwritten. Incorrect. if it was a s/w raid device already, then nothing gets touched except the raid super-block that was already there. Resync may occur, but there are mkraid options to keep that from happening too. James
Re: Raid-Failure, please help
[Jochen Haeberle] > does not recreate automatically... The problem mentioned striking me > most is "md0 has overlapping physical units with md2"... this does > not sound very good to me... That's informative about resync operations. It is not an error. > May we run fsck on the md devices??? sure, as long as the devices are active (check /proc/mdstat) James
Re: 2.2.16 RAID patch
[Matthew DeFoor] > I hate to bother the list with this, but...I have been unable to get > Redhat 6.1/2.2.16+raid-2.2.16-A0 working with Root RAID1. > > image=/boot/bzImage > label=linux > initrd=/boot/initrd-2.2.16.RAID.img > read-only > root=/dev/md0 > > request_module[md-personality-3]: Root fs not mounted > do_md_run() returned -22 re-make your initrd and include --with=raid1 (just did the same thing at our installfest last weekend :) James
Re: bonnie++ for RAID5 performance statistics
[Gregory Leblanc] > Sounds good, James, but Darren said that his machine had 256MB of ram. I > wouldn't have mentioned it, except that it wasn't using enough, I think. it tries to stat /proc/kcore currently. no procfs and it'll fail to get a good number... I've thought about other approaches, too, but since this is just a fall-back mechanism when the person doesn't specify a size (like they should), I don't give it much worry. Patches always welcome, though, of course :) > a side note, I think that 3x would be a better number than 4, but maybe it's > just me. I've got multiple machines with 256MB of ram, but only 1GB or 2GB > RAID sets. 4x ram would overflow the smaller RAID sets. I've thought about parsing df output of the $dir and clamping on that, but I haven't gotten around to it yet. Keep in mind, this is still all fall-back... you should be passing the right value in the first place :) James
Re: bonnie++ for RAID5 performance statistics
[Gregory Leblanc] > > [root@bod tiobench-0.3.1]# ./tiobench.pl --dir /raid5 > > No size specified, using 200 MB > > Size is MB, BlkSz is Bytes, Read, Write, and Seeks are MB/sec > > Try making the size at least double that of ram. Actually, I do exactly that, clamping at 200MB and 2000MB currently. Next ver will up it to 4xRAM but probably leave the clamps as is. (note: only clamps when size not specified... it always trusts the user) James
Re: Linux raid 5 recovery
[Wishart, Aaron M. (James Tower)] > I have a raid5 file system consisting of 8, 9-gig quantum scsi drives (scsi > id 0-6, 8). The drive with the scsi id of 1 failed. I replaced the drive > and ran "raidhotadd /dev/scb /dev/md0" It appeared to run so I left for the > weekend. When I came in this morning the syslogd was using 75% of the cpu > and outputting "kernel: raid5: md0: unrecoverable error I/O error for block > #" from some kind of loop it apparently failed around 4:00am Saturday ( > I started the restore at about 4:00 Friday afternoon). - you'd typically do something like "raidhotadd /dev/md0 /dev/sdb1" instead, after replacing the disk, making sure it came back as sdb (as per kernel log), fdisk'ing to make a partition with type fd (no, not 100% necessary, but almost always a good idea) then doing the raidhotadd. - After the raidhotadd you'd check /proc/mdstat to confirm the array is reconstructing on the new drive (partition, really). Aside from those two (which I don't think is really the issue, but worth clarifying), I'd say there's the possibility that another drive gave an error (maybe a soft error, the raid code doesn't really differentiate and can get quite picky even if the underlying drive successfully remapped the sector) without the resync completed (resync's seem to take much longer than they should, but maybe that's just me... I mirror entire drives in 20 minutes, but resync's seem to take over a dozen hours) Good luck, James
Re: Forcing Rebuild/Reconstrution
[Peter Hircock] > d) raidhotadd /dev/md2 /dev/hdc3 > Don't have raidhot add. raidhotadd is a symlink to raidstart that gets created when you do the "make install". Might wanna check you've done that and then check that the /sbin directory is in your path (or wherever you installed the raidtools) James
Re: HELP with autodetection on booting
[Gregory Leblanc] > I started seeing this when I blew away my RAID0 arrays and put RAID1 arrays > on my home machine. I suspect that this is cause by RedHat putting > something in the initscripts to start the RAID arrays AND the RAID slices > being set to type fd (RAID autodetect), but I haven't been able to confirm > this. And since I just totaled my RH install, it may be a couple of weeks > before I get back to look some more. Just to confirm :) /etc/rc.d/rc.sysinit will attempt to activate any /etc/raidtab entries that aren't listed as active already in /proc/mdstat. This can certainly be a nuisance in some cases, but I guess they feel it works well in most cases (and they may be right). Certainly can cut down the need for partition types of "fd", although it would appear to be more important to keep your raidtab aligned with reality (although that's a good practice anyway since we may need it for recovery later on). James
Re: HELP with autodetection on booting
[Jieming Wang] > autorun ... > considering sdb1 ... > adding sdb1 ... > adding sda1 ... > created md0 > bind > bind > running: > now! > sdb1's event counter: 000a > sda1's event counter: 000a Looks like a couple of partitions with type fd, looking great for autostart by the raid code. > kmod: failed to exec /sbin/modprobe -s -k md-personality-3, errno = 2 > do_md_run() returned -22 Doh! More likely than not, you'll want to build-in the necessary raid levels into the kernel. Otherwise, you end up in a chicken-and-egg problem (possibly, depending on fs layout) where you need to load a module from a filesystem that you can't get to without the module loaded. James
Re: Any distro with automated raid setup?
[Slip] >I'm wondering if anyone has run into a distribution of linux that >has software raid-util's pre-packaged into it, or available in a third >party package. I'v been trying to setup software raid with three 2.1G >SCSI drives for quite a while now and am simply looking for an easier >sollution. Any pointers/suggestions? FWIW, the Red Hat 6.2 installer is the only one I know of that's software-raid aware enough to create them at install time and even boot from them (raid1 only at the moment). Red Hat 6.2 is also the only distro (AFAIK at least) that has a lilo patched to understand software raid devices (although you can certainly apply the patch yourself or install RH 6.2's lilo package). James
Re: HELP!!! Broken raid0
[Matthew Burke] > On Sun, 28 May 2000, James Manning wrote: > > [Matthew Burke] > > > e2fsck 1.18, 11-nov-1999 for EXT2 FS 0.5b, 95/08/09 > > > e2fsck: Attempt to read block from filesystem resulted in short read while > > > trying to open /dev/md1 > > > Could this be a zero-length partition? > > mdstat: > > Personalities : [raid0] > read_ahead 1024 sectors > md0 : active raid0 hdc1[1] hda3[0] 1606272 blocks 64k chunks > unused devices: No active /dev/md1, so e2fsck failing is normal. > hda: ST36531A, 6204MB w/128kB Cache, CHS=790/255/63, (U)DMA > hdb: IBM-DJNA-351520, 14664MB w/430kB Cache, CHS=1869/255/63, (U)DMA > hdc: ST36531A, 6204MB w/128kB Cache, CHS=13446/15/63, (U)DMA > > *** edited note from matt - the CHS values have always been different for > some unknown reason... AFAIK, you simply have one drive in LBA mode and not the other. in my exp, just a bios setting difference but you're under 8GB anyway so I'm not sure it really makes a diff. > autodetecting RAID arrays > (read) hda3's sb offset: 787072 [events: 0063] > (read) hda4's sb offset: 5470016 [events: 005e] > (read) hdc1's sb offset: 819200 [events: 0063] > (read) hdc3's sb offset: 5470016 [events: ] > md: invalid superblock checksum on hdc3 Sure makes it look like hdc3 has some major issues. It has a partition type of fd, but invalid raid superblock. Makes me wonder if e2fsck didn't get run on hdc3 itself and it "fixed" that last part (hope not since it may have done some real superblock damage). hdc itself looks ok since hdc3 doesn't seem to have any problems, so I don't think it's an actual drive problem. Unfortunately, since it appears that the raid superblock (at a minimum) is broken on hdc3, the only thing I can think to recommend is - mkraid --force /dev/md1 (rewrites raid superblocks) - try to raidstart /dev/md1 (and hope that the real data is ok) - mount -o ro /dev/md1 /mnt (see if it looks ok) There is the chance that the partition table got slightly corrupted and hdc3's entry has an incorrect value (unlikely, though, since the size matches hda4). Make sure your raidtab matches md1's actual devices before running the --force, of course. Note that "normally" the superblock checksum is fine and the update counter is only a few off from the most recent, so I want to stress that if there is something strange wrong (like a partition table screwup), the writing of the raid superblocks can corrupt data. If this all makes you nervous, feel free to see what others may recommend... I've certainly never dealt with this exact kind of situation before (array recovery attempts for a raid0 array :) James
Re: HELP!!! Broken raid0
[Matthew Burke] > e2fsck 1.18, 11-nov-1999 for EXT2 FS 0.5b, 95/08/09 > e2fsck: Attempt to read block from filesystem resulted in short read while > trying to open /dev/md1 > Could this be a zero-length partition? > > /dev/md1 is not mounted, but it is properly set up in /etc/raidtab > > raidstart /dev/md1 produeces no error message, but fails to do anything. Could you paste /proc/mdstat? If the arrays aren't active, fsck won't be able to do anything on them. If the arrays are indeed inactive, some syslog entries that relate to it (autostart'ing, I'd imagine) could be helpful as well. James
Re: Problems creating RAID-1 on Linux 2.2.15/Sparc64
[Ion Badulescu] > In article >[EMAIL PROTECTED]> you >wrote: > > > I am having trouble using Linux RAID on a Sun Ultra1 running > > 2.2.15. > > You need an additional patch, just plain vanilla 2.2.15 + raid-0.90 won't > do on a sparc. Red Hat have it in their 2.2.14-12 source rpm, but I'm > attaching it here, for convenience. Actually, I don't believe he's applied the 0.90 patch on top of 2.2.15, given his /proc/mdstat: > > /proc/mdstat remains constant with the following: > > > > Personalities : [1 linear] [2 raid0] [3 raid1] [4 raid5] > > read_ahead not set > > md0 : inactive > > md1 : inactive > > md2 : inactive > > md3 : inactive So he may want to start out with http://people.redhat.com/mingo/raid-patches/raid-2.2.15-A0 first. James
Re: raid5 disk failure
[Jakob Østergaard] > > Set up a raidtab entry **WITH GREAT CARE** specifying the minimal set as > > above, with the oldest partitions `raid-failed'. Now create the device. > > This will write a new set of consistent PSBs. > > Correct. s/raid-failed/failed-disk/ as per section 6.1 http://www.linuxdoc.org/HOWTO/Software-RAID-HOWTO-6.html#ss6.1 James
Re: Will kernel 2.4 include latest RAID patches?
[Marco Shaw] > The 2.4 kernel tree itself will not, but Linux distributions will. RedHat > has been patching their products since 6.1, so I'm thinking SuSE isn't far > behind. Incorrect. As of 2.3.99-pre8, the merge is (mostly) done, with just a few straglers left to get cleaned up. Once my 8-way Xeon finishes the find | xargs -P 8 bzip2 -9 I've got running, I'm gonna check KNI support. James
Re: md0 won't let go... (dmesg dump...)
[Harry Zink] > While I appreciate the patch/diff provided by James Manning, I am extremely > weary of applying anything to a system that I don't fully understand - > particularly if it is suffixed by "Who knows..." (shiver). I hadn't had a chance to test it... this one (attached) works (I had forgotten to update the index commands in the hd[i-j] and hd[k-l]) > Now, I just need to make sure all devices are attached as Master devices, on > their own controller port, and then figure out what minor and major to set > them at... *ANY* help in allowing me to better understand how that's done, > or in actually doing this will be appreciated. anything on an "even" device (hdb, hdd, hdf, hdh, hdj, hdl, etc) is a slave. the "odd" ones (hda, hdc, etc) are masters > Alright, maybe it's oversimplified, but I grok that part (that the kernel > needs the proper device files, and that I don't have the device files, and > thus need to create them. Actually, the kernel doesn't need the /dev files... user-space programs (fdisk, for instance, possibly mkraid too, not sure) need them as an interface to the devices in the kernel... devfs may make this picture clearer down the road... or muddier :) > Thanks, and thanks to James Manning as well for finally tracking down what > the core of this problem is. MAKEDEV is historically bad about keeping up with devices.txt, so it's fairly common... those mknod's I gave last time should work too > Is there some utility that will quickly and easily create /dev/ files and > provides qualified questions to assist in properly creating /dev/ files? MAKEDEV is a decent shell script, although it's just glorified mknod wrapping when it comes down to it :) reading devices.txt and a mknod --help is about all that can be done for understanding the /dev entries... as to major/minor and why they're still around, "historical cruft" is about it for now. James --- /dev/MAKEDEVThu Mar 2 16:35:20 2000 +++ /tmp/MAKEDEVWed May 17 13:33:35 2000 @@ -180,7 +180,7 @@ do case "$1" in mem|tty|ttyp|cua|cub) ;; - hd) (for d in a b c d e f g h ; do + hd) (for d in a b c d e f g h i j k l; do echo -n hd$d " " done) ; echo ;; @@ -188,6 +188,8 @@ ide1) echo hdc hdd ;; ide2) echo hde hdf ;; ide3) echo hdg hdh ;; + ide4) echo hdi hdj ;; + ide5) echo hdk hdl ;; sd) echo sda sdb sdc sdd ;; sr) echo scd0 ;; st) echo st0 ;; @@ -621,6 +623,28 @@ major=`Major ide3 34` || continue unit=`suffix $arg hd` base=`index gh $unit` + base=`math $base \* 64` + makedev hd$unit b $major $base $disk + for part in 1 2 3 4 5 6 7 8 # 9 10 11 12 13 14 15 16 17 18 19 20 + do + makedev hd$unit$part b $major `expr $base + $part` $disk + done + ;; + hd[i-j]) + major=`Major ide4 56` || continue + unit=`suffix $arg hd` + base=`index ij $unit` + base=`math $base \* 64` + makedev hd$unit b $major $base $disk + for part in 1 2 3 4 5 6 7 8 # 9 10 11 12 13 14 15 16 17 18 19 20 + do + makedev hd$unit$part b $major `expr $base + $part` $disk + done + ;; + hd[k-l]) + major=`Major ide5 57` || continue + unit=`suffix $arg hd` + base=`index kl $unit` base=`math $base \* 64` makedev hd$unit b $major $base $disk for part in 1 2 3 4 5 6 7 8 # 9 10 11 12 13 14 15 16 17 18 19 20
Re: md0 won't let go... (dmesg dump...)
[Harry Zink] > Not sure what this will help, except confirm again that these volumes aren't > accessible, which was my question to start with. question is "why?", answer is "no appropriate /dev entries" > [root@gate src]# ls -l /dev/hdj1 > ls: /dev/hdj1: No such file or directory > [root@gate src]# ls -l /dev/hdj > ls: /dev/hdj: No such file or directory > [root@gate src]# ls -l /dev/hdk > ls: /dev/hdk: No such file or directory > [root@gate src]# ls -l /dev/hdk1 > ls: /dev/hdk1: No such file or directory That's why you can't fdisk (just as Gregory has pointed out before)... get those created (see previous note as per MAKEDEV)... default setup is 4 IDE controllers (ide[0-3]) which correspond do the 8 IDE devices hd[a-h]... Judging by /usr/src/linux/Documentation/devices.txt, I'd say the major's for these new devices should be 56 and 57, so my guess would be: mknod /dev/hdj b 56 64 mknod /dev/hdj1 b 56 65 mknod /dev/hdk b 57 0 mknod /dev/hdk1 b 57 1 attached is what might be a working MAKEDEV patch... who knows. Bleah, James --- /dev/MAKEDEVThu Mar 2 16:35:20 2000 +++ /tmp/MAKEDEVWed May 17 11:17:28 2000 @@ -188,6 +188,8 @@ ide1) echo hdc hdd ;; ide2) echo hde hdf ;; ide3) echo hdg hdh ;; + ide4) echo hdi hdj ;; + ide5) echo hdk hdl ;; sd) echo sda sdb sdc sdd ;; sr) echo scd0 ;; st) echo st0 ;; @@ -619,6 +621,28 @@ ;; hd[g-h]) major=`Major ide3 34` || continue + unit=`suffix $arg hd` + base=`index gh $unit` + base=`math $base \* 64` + makedev hd$unit b $major $base $disk + for part in 1 2 3 4 5 6 7 8 # 9 10 11 12 13 14 15 16 17 18 19 20 + do + makedev hd$unit$part b $major `expr $base + $part` $disk + done + ;; + hd[i-j]) + major=`Major ide4 56` || continue + unit=`suffix $arg hd` + base=`index gh $unit` + base=`math $base \* 64` + makedev hd$unit b $major $base $disk + for part in 1 2 3 4 5 6 7 8 # 9 10 11 12 13 14 15 16 17 18 19 20 + do + makedev hd$unit$part b $major `expr $base + $part` $disk + done + ;; + hd[k-l]) + major=`Major ide5 57` || continue unit=`suffix $arg hd` base=`index gh $unit` base=`math $base \* 64`
Re: md0 won't let go... (dmesg dump...)
[Harry Zink] >Doing fdisk /dev/hdf works just fine. >Doing fdisk /dev/hdg or /dev/hdk results in the old 'unable to open >hdj/hdk' ls -l /dev/hd[gk]* ... you make need a later MAKEDEV (or edit yours) to create all the necessary files >Alright, try turning off the RAID again ... raidstop -all or raidstop >/dev/md0. >This generates the following: >raidstop /dev/md0 >/dev/md0: Device or resource busy mounted filesystem... clear processes using it and umount it (show df output too) >So, this time it won't let go of hdj and hdk (I moved the drives >around during the rebuild), which *DO* exist, and whose partition ID I >can't change (even though it is currently blank/unformatted) becaused >I can't use fdisk... > >md0 : active raid0 hdh1[1] hdg1[0] 19806976 blocks 16k chunks md is using hdh1 and hdg1 ... it's not using hdj or hdk If you wish them (hdh1, hdg1) to not get run automatically, fdisk them and set the type back to 83 from fd (the autorun consideration proves all these partitions are still "fd") These are all the same things hashed over before, so no, I don't really expect this email to have any real consequence. *sigh* James
Re: md0 won't let go... (dmesg dump...)
[Tommy] > When reading through this, my first impulse is to say that /dev/hdl isn't > correct. When I recently built a raid5 using 3 promise cards, I found > that in spite of the kernel detecting hdk hdm and hdo, these devices were > NOT built in /dev. In fact, I had to dig into the ide header file to even > find the proper MAJOR node settings for the devices. I'd think that, but he's still not put out the /proc/mdstat I asked for multiple times, and the dmesg output he showed didn't have hdl involved in md0 at all. I don't honestly believe hdl, if it even exists, is even remotely involved in s/w raid. I don't see dmesg output that reports an hdl (/dev entries not affecting the kernel, obviously), either. James
Re: md0 won't let go... (dmesg dump...)
[Harry Zink] > autorun ... > considering hdh1 ... > adding hdh1 ... > adding hdg1 ... > created md0 so hdh and hdg certainly both have partitions and bothare set to type fd fdisk to /dev/hdl would seem to be failing because there is no hdl device if you're trying to "free" hdg and/or hdh, fdisk their type to 83 instead of fd and they won't autostart. If you're trying to do something with hdl (if it exists), md isn't the problem. James
Re: md0 won't let go...
[Harry Zink] > [root@gate Backup]# raidstop /dev/md0 > /dev/md0: Device or resource busy > > (This is normal, the fs is shared by atalk. I disable atalk) > > [root@gate Backup]# raidstop /dev/md0 > /dev/md0: Device or resource busy > > (Now this is no longer normal. No services or anything else is using the > partition. I made sure no one is logged in to that partition. Still, the > same error.) Based on the above, I'd say your md0 is still mounted as a filesystem. umount it, or if you're having real problems getting it umounted add noauto to fstab options for the fs and the next boot shouldn't mount it and raidstop will work fine. If it's not mounted, and you're getting the above errors, please send df output. James
Re: md0 won't let go...
[Harry Zink] > on 5/10/00 2:30 PM, [EMAIL PROTECTED] at [EMAIL PROTECTED] > wrote: > > You probably need to do a 'raidstop' on md0. Then, maybe you can > > fdisk it? > > Been there, done that. > Makes no difference. It just very persistently holds on to these drives. Are you claiming that /proc/mdstat has the md0 active both before and after running raidstop /dev/md0? Just want to clarify. James
Re: What is the "standard" way to delete RAID devices?
[Dave Meythaler] > I have looked through the Software RAID howto, the Bootable RAID howto, the > docs that come with raidtools 0.90 and the man pages and I haven't been able > to find any way to delete a raid device once it has been created. since a raid device is just a virtual block device over other real devices, it is a little vague what you mean by "delete". But, going by what I think you mean, you'll want to: - rename /etc/raidtab (in case your distro has initscripts which try to activate raidtab entries that aren't active in /proc/mdstat) - raidstop the array(s) (check /proc/mdstat) - if their partition types are "fd", make them "83" or another appropriate value so your autodetect doesn't try to find it (although if the superblock isn't valid it won't start an array anyway) - mke2fs (or whatever else) for giving new roles to your now-unused partitions/drives > I'm trying to get rid of a raid device (RAID 0 or 1) which was created using > the "persistent-superblock" option on Red Hat 6.2 (kernel source 2.2.14-12). The persistent superblock isn't persistent in that manner :) Once the array is raidstop'd, you can mke2fs the partition immediately (I do just that all the time checking performance between disks and a s/w raid of them) > Is there some kind of command/tool to do this that I haven't stumbled > across? It would be nice if the howto could say something on this topic. There could be... it'd be small since the above is about it, but it's Jakob's call. James
Re: System lockup during raidhotadd
[Ian Morgan] > I can raidhotremove the (simulated-)faulty disk, and then physically remove > it. Next, I put the disk back in physically. I then want to run raidhotadd to > add the disk back into the array and begin reconstruction. > > Problem is, when I run raidhotadd, the system totally locks up solid. I've > tried giving it time to come back to life, but nothing happens even after > several minutes, and the system is so dead that the software watchdog is > also toast. In my experience, any drive manipulation (in terms of what's attached, what's seen by the kernel, etc) that locks up the machine has been strictly a device driver problem. Assuming this is SCSI, it may help to do the add-single-device/remove-single-device commands as per drivers/scsi/scsi.c lines 2389 and 2447 respectively (2.2.15 src). If the initial detachment didn't propogate up the device removal through the driver, the reattachment may have caused some problems (creating data structures already there and populated, scribbling over valid values... who knows). Just a guess. > kernel: 2.2.16pre2 SMP reproducible on 2.2.15 proper? > raid: mingo's raid-2.2.15-A0 > tools: raidtools-19990824-0.90 > > Is this a known problem? Am I using the right procedure to replace a faulty > disk? Would a raidstop/raidstart work? Isn't there a way to replace a drive > without taking the array down? The HOWTO is not very detailed in this area > of reconstruction. It makes it sound like this should all be a no-brainer. James
Re: lilo: Sorry, don't know how to handle device 0x0905
[Martin Munt] > Sorry, don't know how to handle device 0x0905 You can avoid the lilo.conf tricks and just use a normal one (avoiding partition=, disk=, etc) if you used a lilo patch with the raid1 support (lilo.raid1) written by Doug Ledford (thanks Doug!). This list's archives have it, or you can simply fetch the lilo package out of RH 6.1 or 6.2 (alien/rpm2cpio to your distro as needed) This is specific to s/w raid1 since other raid levels don't have the kernel contiguous on a physical disk. James
Re: can't locate module block-major-22
[Jason Lin] > After my raid-1 is up and running I shutdown the > machine and took out one hard disk.(the one without > Linux installed.) Just to see how it behaves. > During reboot it drops to single user mode due to RAID > device error. > > "raidstart /dev/md0" raidstart? eww :) > modprobe: can't locate module block-major-22 > /dev/md0: invalid argument Since the first drive (raid-disk 0) is gone, AFAIK you have to get autostart working by doing partition type fd for hd[ac]7 and enabling autostart in the kernel block device section. The raidstart approach (as per Ingo's post of maybe a week ago) will fail if the first disk is unavailable. Thankfully, there's not much reason to avoid autostart these days. > raiddev/dev/md0 > raid-level 1 > nr-raid-disks 2 > nr-spare-disks 0 > chunk-size 4 > persistent-superblock 1 > > device /dev/hdc7 > raid-disk 0 > > device /dev/hda7 > raid-disk 1 > > > raiddev/dev/md1 > raid-level 1 > nr-raid-disks 2 > nr-spare-disks 0 > chunk-size 4 > persistent-superblock 1 /dev/md1 with no disks defined? Guess it doesn't matter since the operations are being doing on md0, but it's strange to see the extra (apparently useless?) stanza there. James
Re: celeron vs k6-2
[Seth Vidal] > I did some tests comparing a k6-2 500 vs a celeron 400 - on a raid5 > system - found some interesting results > > Raid5 write performance of the celeron is almost 50% better than the k6-2. Can you report the xor calibration results when booting them? > Is this b/c of mmx or b/c of the FPU? FPU should never get involved (except the FPU registers getting used during MMX operations). As per Greg's report of the K6-2 having MMX instructions, remember that a chip having instructions doesn't mean they get used. Again, this is something that the xor calibrations should help show, though. MTRR could certainly be another source of additional performance, but I haven't dealt with the K6-2 in any capacity so I don't even know whether it has that capability (although I haven't personally heard of anything not based on the P6 core using MTRR) > I used tiobench in sizes of > than 3X my memory size on both systems - > memory and drives of both systems were identical. If possible, let the resync's finish before testing... this can cause a huge amount of variance (that I've seen in my testing). speed-limit down to 0 doesn't appear to help, either (although the additional seeks to get back to the "data" area from the currently resyncing stripes could be the base cause) When looking from a certain realistic POV, it'd be hard to believe that even a P5 couldn't keep up with the necessary XOR operations... is there anything else on the system(s) fighting for CPU time? James
Re: drive XOR cmd for parity generation
[Bill BAO] > is there anybody doing the parity generation by using the > drive XOR cmd (XDWRITE, XDREAD, XDPWRITE) ? > > we will start this kind of work in Linux raid, > want to know anybody else is also doing the same thing. > we're looking for cooperation. When I last reviewed FC-AL, I have to admit that the benefits of the new SCSI commands XDWRITE and XPWRITE (along with BUILD and REBUILD) fascinated me. I can't (at the moment) see this going into the Linux s/w raid, though, mainly because it's so FC-specific (in my experience) and would appear to violate the abstraction layer that the raid code can exist at now. I'm also not sure how much (if any) it buys you when the raid5 can be dispersed over multiple controllers, multiple PCI busses, etc. Since you've (obviously :) thought and considered this more than I, could I talk you into a brief explanation of what (besides the bus trasfers 4->2, h/w raid controller actions 6->1) and how this can help s/w raid? It'll also give you a great chance to alleviate any worries and quell any issues before they get brought up :) Thanks, James
[PATCH] 2.2.14-B1 bug in file raid5.c, line 659
Summary: raid5_error needs to handle the first scsi error from a device and do the necessary action, but silently return on subsequent failures. - 3 h/w raid0's in a s/w raid5 - initial resync isn't finished (not important) - scsi error passed up takes out one of the devices bug triggered is when raid5_error is called passing in a device (sde1) that doesn't match against "disk->dev == dev && disk->operational" (mainly because the disk->operational was already set to 0 13 seconds previously when the first scsi error was passed back and sde1 matched) Since multiple scsi errors getting passed back from the same failure seems valid (multiple commands had been sent, and each will fail in turn), we should simply handle the first one and have raid5_error exit quietly on the later ones (re-doing the spare code execution could possibly even cause big problems for multiple available spares). Patch attached. Personalities : [raid5] read_ahead 1024 sectors md0 : active raid5 sde1[2](F) sdd1[1] sdc1[0] 177718016 blocks level 5, 4k chunk, algorithm 0 [3/2] [UU_] unused devices: log attached. James --- linux/drivers/block/raid5.c.origThu Apr 20 11:27:37 2000 +++ linux/drivers/block/raid5.c Thu Apr 20 11:32:16 2000 @@ -611,23 +611,29 @@ PRINTK(("raid5_error called\n")); conf->resync_parity = 0; for (i = 0, disk = conf->disks; i < conf->raid_disks; i++, disk++) { - if (disk->dev == dev && disk->operational) { - disk->operational = 0; - mark_disk_faulty(sb->disks+disk->number); - mark_disk_nonsync(sb->disks+disk->number); - mark_disk_inactive(sb->disks+disk->number); - sb->active_disks--; - sb->working_disks--; - sb->failed_disks++; - mddev->sb_dirty = 1; - conf->working_disks--; - conf->failed_disks++; - md_wakeup_thread(conf->thread); - printk (KERN_ALERT - "raid5: Disk failure on %s, disabling device." - " Operation continuing on %d devices\n", - partition_name (dev), conf->working_disks); - return -EIO; + /* Did we find the device with the error? */ + if (disk->dev == dev) { + /* Did we handle its failure already? */ + if (disk->operational) { + disk->operational = 0; + mark_disk_faulty(sb->disks+disk->number); + mark_disk_nonsync(sb->disks+disk->number); + mark_disk_inactive(sb->disks+disk->number); + sb->active_disks--; + sb->working_disks--; + sb->failed_disks++; + mddev->sb_dirty = 1; + conf->working_disks--; + conf->failed_disks++; + md_wakeup_thread(conf->thread); + printk (KERN_ALERT + "raid5: Disk failure on %s, disabling device." + " Operation continuing on %d devices\n", + partition_name (dev), conf->working_disks); + return -EIO; + } + /* Don't do anything for failures past the first */ + return 0; } } /* Apr 19 16:02:41 rts-test2 kernel: SCSI disk error : host 3 channel 0 id 2 lun 0 return code = 800 Apr 19 16:02:41 rts-test2 kernel: [valid=0] Info fld=0x0, Current sd08:41: sense key None Apr 19 16:02:41 rts-test2 kernel: scsidisk I/O error: dev 08:41, sector 9296408 Apr 19 16:02:41 rts-test2 kernel: interrupting MD-thread pid 2807 Apr 19 16:02:41 rts-test2 kernel: raid5: parity resync was not fully finished, restarting next time. Apr 19 16:02:41 rts-test2 kernel: raid5: Disk failure on sde1, disabling device. Operation continuing on 2 devices Apr 19 16:02:41 rts-test2 kernel: md: recovery thread got woken up ... Apr 19 16:02:41 rts-test2 kernel: md0: no spare disk to reconstruct array! -- continuing in degraded mode Apr 19 16:02:41 rts-test2 kernel: md: recovery thread finished ... Apr 19 16:02:41 rts-test2 kernel: md: updating md0 RAID superblock on device Apr 19 16:02:41 rts-test2 kernel: (skipping faulty sde1 ) Apr 19 16:02:41 rts-test2 kernel: sdd1 [events: 0002](write) sdd1's sb offset: 88859008 Apr 19 16:02:41 rts-test2 kernel: sdc1 [events: 0002](write) sdc1's sb offset: 88859008 Apr 19 16:02:41 rts-test2 kernel: . Apr 19 16:02:41 rts-test2 kernel: raid5: restarting stripe
Re: adaptec 2940u2w hangups
Ok, normally I'd not bother with this kind of message, but Brian (Haymore) has been both nice and helpful in my experience, so I'm going to do a little sticking up for him since he's being uselessly railed on :) Note that I specifically hate flaming that doesn't get taken off-list, but as I correct factual error(s), I believe this is still valid for linux-raid. With that said, on with the show. :) [The coolest guy you know] > "Brian D. Haymore" wrote: > > U2W can actually be LVD as well. My Mylex eXtremeRAID 1164 card is U2W > > and LVD so just saying U2W is for sure LVD or SE is wrong. Read the > > manual or read the specs on the manufactures web site. Same message *I* was about to send about my DAC-1164P's too :) > Pardon me for just saying "the U2W" when I meant the entire "2940U2W". This didn't matter. What you *specifically* said in message <[EMAIL PROTECTED]> was > > LVD is for the new U160 protocol and "LVD is for the new U160 protocol" is clearly a board-independent (and factually incorrect) statement. (How incorrect? See below) > And to be fair, you are talking about a card about 10 times more > expensive than the one being discussed in this thread. Don't see what price has to do with fairness here. True the original thread is about the 2940U2W (not that it ends up mattering, see below), but you were responding (initially) to a message that was much more SCSI-generic (active/passive termination WRT the terminator [EMAIL PROTECTED] had bought at a computer store)... but, I digress. > The "Adaptec 2940U2W" does not specifically support LVD like the > "Mylex Extreme RAID 1100 Ultra2 Wide LVD SCSI PCI RAID Controller". Glad you cleared that up... Can you correct Adaptec? I guess they don't know the hardware they build :) http://www.adaptec.com/support/faqs/aha2940u2whardware.html#1 Q: What is the SCSI Card 2940U2W? A: The SCSI Card 2940U2W (or AHA-2940U2W) is the latest in the line of Adaptec PCI host adapters. It has the latest SCSI Ultra2 technology which uses Low Voltage Differential (LVD) circuitry designed into the CMOS to provide a bandwidth that is up to twice the current Ultra speeds and with cable lengths up to 25 meters. They get it "wrong" in other places, too, like listing the 2940U2W under the "Low Voltage Differential / Ultra2 PCI SCSI" section at http://www.adaptec.com/support/files/drivers.html (Don't they know that "LVD is for the new U160 protocol"?) > Adaptec also does not specifically support Linux the way Mylex does. Maybe you talk to Ledford about what level of interaction he has with Adaptec engineers and rethink this statement :) HTH, HAND, (wow, that *was* therapeutic!) James
Re: Combining RAID 0 and RAID 1
[Gregory Leblanc] > > Recovery is a tad simpler with raid1 done at the lower level simply > > because none of the md device ever "dies", just one falls > > into degraded > > and you can skip an mkraid and let normal recovery take over. > > Of course, > > that leaves the raid1 read balancing algorithm (arguably the > > weak point in > > the read performance of 0+1 or 1+0) running in two places > > instead of one. > > Could you elaborate a little? Are you talking about the default 0.90 code, > or patched with Mika's brilliant patch? Theoretically, RAID1+RAID0 should > be extreemly fast for reads, and only a bit slower for writes, assuming that > you're not saturating the bus. Mika's patch is a straightforward one that improves small, random (ie seek-heavy) reads well. I haven't seen it (in my experience) improve large sequential reads to the point of raid0 (just in my testing), but it's an issue Mika and I have hashed over many other times, and it's not worth banging over again on this list. Thankfully, it's now a largely moot issue in the cases I need as madvise(MADV_SEQUENTIAL) is around so I can get async forward page-in's (the main reason I don't care about seq raid1 read perf much anymore, and why I added the mmap/madvise code to tiobench) James
Re: Combining RAID 0 and RAID 1
[Werner Reisberger] > I am wondering if there is a possibility to use RAID 0 and RAID 1 together, > i. e. mirroring two RAID 0 devices? Absolutely. The most common setup appears to be: drives 1+2: md0 (raid0) drives 3+4: md1 (raid0) md0+md1:md2 (raid1) > Two general questions: > > - Are there any instructions for the new raidtools what to do in cases >of disk or power failures? I only found partial outdated hints in the old >HOWTO. the new howto should cover it well (now in the LDP, at http://linuxdoc.org/HOWTO/Software-RAID-HOWTO.html), but for the above scenario, the failing drive should take down the appropriate md device (md0 or md1) and then the md2 device should fall into degraded mode. Regular recovery techniques (sections 5 and 6 cover them well) to get the supporting raid0 device's drive replaced and the device re-mkraid'ed, then raidhotadd to bring back md2. Recovery is a tad simpler with raid1 done at the lower level simply because none of the md device ever "dies", just one falls into degraded and you can skip an mkraid and let normal recovery take over. Of course, that leaves the raid1 read balancing algorithm (arguably the weak point in the read performance of 0+1 or 1+0) running in two places instead of one. Probably a common enough request to warrant a howto subsection :) > - Is there an archive for this mailing list? If not I could set up one. http://www.mail-archive.com/linux-raid@vger.rutgers.edu/ James
Re: RAID1: how to control which disk is syn'ed to which.
[Jason Lin] > After installing RedHat6.1 on /dev/hda > I added 2nd hard disk, /dev/hdc, which has same > capacity as /dev/hda. > Then a RAID1 device, /dev/md0, was created with > /dev/hda2 and /dev/hdc2 as the constituent partitions. > (/dev/hda2 contains data for /home) > > Is there a way to control which disk is syn'ed to > which? "failed-disk" directive. As an example, you can check out "Method 2" of the "Root filesystem on RAID" section (although in your case it's /home so life is a little easier) at http://www.linuxdoc.org/HOWTO/Software-RAID-HOWTO-4.html#ss4.12 Jakob: how do you feel about a section that covers the "mirroring of currently existing filesystem" case? mirroring /home when a spare drive becomes available could be quite useful :) James
Re: a raid configuration & questions about battery
[David Konerding] > 4 drives 36gig Ultra 2 SCSI (or LVD? or Ultra 3?) (3 active drives & 1 hot > spare) Make sure to consider 4-drive raid1 as well > From poking around the kernel, and reading some stuff on web sites, and > visiting the vendor websites, it seems like the less expensive cards > ($500-1000) typically don't have a battery backup for the cache on the card. > I was thinking, however, that > the UPS makes the cache battery unecessary. Is this a valid belief? Or is > there a situation where having the battery backup > is a good idea? I personally trust my UPS just fine. battery-backed write cache is (IMHO) more a check-mark on Draconian TPC-type auditing to ensure recovery capability. > Also, exactly what will having SAF/TE support on the card and the drive > enclosure gain me? Any pointers to SAF/TE documentation online would be > appreciated. http://www.safte.org/ > Will I save a lot of $$$ by eliminating the requirement for hot-swap and > SAF/TE on the rackmount enclosure? Probably not, and when things go bad, life is much easier with a nice SAF-TE compliant enclosure to work with. James
Re: IO-APIC interrupts (was System Hangs -- Which Is...)
[[EMAIL PROTECTED]] > I'm in the same boat. How do you enable IO-APIC support in the > kernel? CONFIG_SMP implies it, and recent 2.3.x (may have been backported) will allow a UP kernel to use IO-APIC (Ingo's work) although I haven't seen a machine (personally) where that's helpful :) > What is MTRR and how is it enabled? CONFIG_MTRR=y Snipped from Documentation/Configure.help: MTRR control and configuration CONFIG_MTRR On Intel P6 family processors (Pentium Pro, Pentium II and later) the Memory Type Range Registers (MTRRs) may be used to control processor access to memory ranges. This is most useful when you have a video (VGA) card on a PCI or AGP bus. Enabling write-combining allows bus write transfers to be combined into a larger transfer before bursting over the PCI/AGP bus. This can increase performance of image write operations 2.5 times or more. This option creates a /proc/mtrr file which may be used to manipulate your MTRRs. Typically the X server should use this. This should have a reasonably generic interface so that similar control registers on other processors can be easily supported. The Cyrix 6x86, 6x86MX and M II processors have Address Range Registers (ARRs) which provide a similar functionality to MTRRs. For these, the ARRs are used to emulate the MTRRs, which means that it makes sense to say Y here for these processors as well. The AMD K6-2 (stepping 8 and above) and K6-3 processors have two MTRRs. The Centaur C6 (WinChip) has 8 MCRs, allowing write-combining. All of these processors are supported by this code. The Centaur C6 (WinChip) has 8 MCRs, allowing write-combining. These are supported. Saying Y here also fixes a problem with buggy SMP BIOSes which only set the MTRRs for the boot CPU and not the secondary CPUs. This can lead to all sorts of problems. You can safely say Y even if your machine doesn't have MTRRs, you'll just add about 9K to your kernel. See Documentation/mtrr.txt for more information. James Manning
Re: RAID5 array not coming up after "repaired" disk
[Marc Haber] > |autorun ... > |considering sde7 ... > |adding sde7 ... > |adding sdd7 ... > |adding sdc7 ... > |adding sdb7 ... > |adding sda7 ... > |created md0 Ok, maybe I'm on crack and need to lay off the pipe a little while, but it appears that sdf7 doesn't have a partition type of "fd" and as such isn't getting considered for inclusion in md0. sde7 failure + lack of available sdf7 == 2 "failed" disks == dead raid5 James, waiting for the inevitable smack of being wrong
Re: raidtools-0.90 ioctl
[Michael T. Babcock] > And where can I find err # 22 ... or is it not defined yet? defined in as EINVAL James
Re: Software RAID with kernel 2.2.14
[flag] > And if I get the same msg when I try to build a raid 0? (my kernel > is RAID patched: 2.2.14) > > [flag@Luxor flag]$ cat /proc/mdstat > Personalities : [1 linear] [2 raid0] [3 raid1] [4 raid5] > read_ahead not set > md0 : inactive > md1 : inactive > md2 : inactive > md3 : inactive allan's right, this is an unpatched kernel. James
Re: newbie needs help
[Wolfram Lassnig] > I´m using a SuSE 6.3, Linux version 2.2.13 ([EMAIL PROTECTED]) > > is it the wrong kernel patch (SuSE does not respond on my queries) SuSE doesn't patch their kernels Excellent software raid howto: http://linuxdoc.org/HOWTO/Software-RAID-HOWTO.html kernel 2.2.14 patch: http://people.redhat.com/mingo/raid-patches/raid-2.2.14-B1 James
[PATCHES] Re: mkraid secret flag
Patches attached: #1: allan noah's suggestion (small warning, 5 seconds, that's it) #2: untested "it compiles" patch for warning file (with Seth's 2 week recommendation on time-span) [ Saturday, March 18, 2000 ] m. allan noah wrote: > think about it! rm by default does not -i! true, although most systems (just going by RH's volume) have alias rm="rm -i" for root (as well as a couple of other possibly-destructive commands) > i feel that mingo/gadi et al have done a fine job, and these utils need to > take the same approach as other system level programs- no convoluted messages > asking for non-disclosure, just the normal warning, and the five second pause. > raid 0.90 is almost grown up. it should act that way. raid 0.90 maturity is orthogonal to the issue of whether we want to warn people on a potentially destructive command. The motivation "It really sucks to LOSE DATA!" applys equally well to Bug-Free (tm) kernel code as to stuff in development (ie, you're willing to destroy what's on disk). In any case, since the patches are small and easy to get almost any warning behavior desired (or none at all), it'll boil down to distro preference anyway. James --- raidtools-0.90/mkraid.c.origSun Mar 19 03:31:48 2000 +++ raidtools-0.90/mkraid.c Sun Mar 19 03:33:46 2000 @@ -68,7 +68,6 @@ int version = 0, help = 0, debug = 0; char * configFile = RAID_CONFIG; int force_flag = 0; -int old_force_flag = 0; int upgrade_flag = 0; int no_resync_flag = 0; int all_flag = 0; @@ -79,8 +78,7 @@ enum mkraidFunc func; struct poptOption optionsTable[] = { { "configfile", 'c', POPT_ARG_STRING, &configFile, 0 }, - { "force", 'f', 0, &old_force_flag, 0 }, - { "really-force", 'R', 0, &force_flag, 0 }, + { "force", 'f', 0, &force_flag, 0 }, { "upgrade", 'u', 0, &upgrade_flag, 0 }, { "dangerous-no-resync", 'r', 0, &no_resync_flag, 0 }, { "help", 'h', 0, &help, 0 }, @@ -116,12 +114,8 @@ } } else if (!strcmp (namestart, "raid0run")) { func = raid0run; - if (old_force_flag) { - fprintf (stderr, "--force not possible for raid0run!\n"); - return (EXIT_FAILURE); - } if (force_flag) { - fprintf (stderr, "--really-force not possible for raid0run!\n"); + fprintf (stderr, "--force not possible for raid0run!\n"); return (EXIT_FAILURE); } if (upgrade_flag) { @@ -167,23 +161,6 @@ if (getMdVersion(&ver)) { fprintf(stderr, "cannot determine md version: %s\n", strerror(errno)); - return EXIT_FAILURE; -} - -if (old_force_flag && (func == mkraid)) { - fprintf(stderr, - -"--force and the new RAID 0.90 hot-add/hot-remove functionality should be\n" -" used with extreme care! If /etc/raidtab is not in sync with the real array\n" -" configuration, then a --force will DESTROY ALL YOUR DATA. It's especially\n" -" dangerous to use -f if the array is in degraded mode. \n\n" -" PLEASE dont mention the --really-force flag in any email, documentation or\n" -" HOWTO, just suggest the --force flag instead. Thus everybody will read\n" -" this warning at least once :) It really sucks to LOSE DATA. If you are\n" -" confident that everything will go ok then you can use the --really-force\n" -" flag. Also, if you are unsure what this is all about, dont hesitate to\n" -" ask questions on [EMAIL PROTECTED]\n"); - return EXIT_FAILURE; } --- raidtools-0.90/mkraid.c.origSun Mar 19 03:31:48 2000 +++ raidtools-0.90/mkraid.c Sun Mar 19 03:55:19 2000 @@ -68,7 +68,6 @@ int version = 0, help = 0, debug = 0; char * configFile = RAID_CONFIG; int force_flag = 0; -int old_force_flag = 0; int upgrade_flag = 0; int no_resync_flag = 0; int all_flag = 0; @@ -79,8 +78,7 @@ enum mkraidFunc func; struct poptOption optionsTable[] = { { "configfile", 'c', POPT_ARG_STRING, &configFile, 0 }, - { "force", 'f', 0, &old_force_flag, 0 }, - { "really-force", 'R', 0, &force_flag, 0 }, + { "force", 'f', 0, &force_flag, 0 }, { "upgrade", 'u', 0, &upgrade_flag, 0 }, { "dangerous-no-resync", 'r', 0, &no_resync_flag, 0 }, { "help", 'h', 0, &help, 0 }, @@ -116,12 +114,8 @@ } } else if (!strcmp (namestart, "raid0run")) { func = raid0run; - if (old_force_flag) { - fprintf (stderr, "--force not possible for raid0run!\n"); - return (EXIT_FAILURE); - } if (force_flag) { - fprintf (stderr, "--really-force not possible for raid0run!\n"); + fprintf (stderr, "--force not possible for raid0run!\n"); return (EXIT_FAILURE); } if (upgrade_flag) { @@ -170,8 +164,17 @@ return EXIT_FAILURE; } -if (old_force_flag && (func == mkraid)) { - fprintf(stderr, +if (force_flag
Re: Patch Application Problem
[ Saturday, March 18, 2000 ] Brian Lavender wrote: > I am trying to apply the raid patch to the 2.2.14 kernel > and I get this error. What is wrong? 1) Great reason to use --dry-run with patch so you can spot possible problems before writing to you source tree. > everest:/usr/src/linux# patch -p1 < raid-2.2.14-B1.patch > patching file `init/main.c' > Hunk #2 FAILED at 488. > Hunk #3 succeeded at 940 with fuzz 2 (offset 12 lines). > Hunk #4 FAILED at 1438. > 2 out of 4 hunks FAILED -- saving rejects to init/main.c.rej > patching file `include/linux/raid/linear.h' > patching file `include/linux/raid/hsm_p.h' > patching file `include/linux/raid/md.h' > patch: malformed patch at line 411: rint_devices(); } 2) in ever other case it's been corrupted downloads (lynx print, netscape save as, whatever), so I'd probably recommend something along the lines of wget, snarf, greed, etc. James
mkraid secret flag
[ Wednesday, March 15, 2000 ] root wrote: > > mkraid --**-force /dev/md0 /me attempts to get the Stupid Idea Of The Month award Motivation: trying to keep the Sekret Flag a secret is a failed effort (the number of linux-raid archives, esp. those that are searchable, make this a given), and a different approach could help things tremendously. *** Idea #1: How about --force / -f look for $HOME/.md_force_warning_read and if not exists: - print huge warning (and beep thousands of times as desired) - creat()/close() the file if exists: - Do the Horrifically Dangerous stuff Benefit: everyone has to read at least once (or at a minimum create a file that says they've read it) Downside: adds a $HOME/ entry, relies on getenv("HOME"), etc. *** Idea #2: --force / -f prints a warning, prompts for input (no fancy term tricks), and continues only on "yes" being entered (read(1,..) so we can "echo yes |mkraid --force" in cases we want it automated). Benefit: warning always generated Downside: slightly more complicated to script Both are fairly trivial patches, so I'll be glad to generate the patch for whichever (if either :) people seem to like. James
Re: IBM ServeRAID Benchmark
[ Tuesday, March 14, 2000 ] Christian Robottom Reis wrote: > Just FYI, a run on a Netfinity 5000 with a ServeRAID card and two IBM 8G > LVD disks plugged into a backplane. I can dig up the model if it makes > things more meaningful. mem=16M, runlevel 1, numruns 5.. you know the > drill. AFAICS to me the ServeRAID is LVD as well, which should give us > 80Mb/s max theoretical throughput. which backplane? first-rev timpani enclosure (just an ibm repackage after buying a company) had problems that put a limit around 11-12 MB/sec in what you could get (which made my tracing efforts take a *long* time at 16GB per trace :) later rev and piano should have that fixed. Also make sure you use the *latest* possible firmware on ServeRAID cards. I finally got a good benchmarking and firmware analysis system cooked up for them, but they've only been using it for the past few months, so later versions have gotten much better. (overview at http://sublogic.com/autotrace/ with visual explanation in the slide at http://sublogic.com/autotrace/slides/sld002.htm) Might wanna try later ips drivers if possible... it's still a fairly new driver, and should be improving still. > Size is MB, BlkSz is Bytes, Read and Write are MB/sec, Seeks are Seeks/sec > > Dir Size BlkSz Thr# Read (CPU%) Write (CPU%) Seeks (CPU%) > - -- --- - -- -- > /usr/ 51240961 11.3997 4.96% 6.99304 4.27% 149.242 0.77% > /usr/ 51240962 11.8671 5.47% 6.95879 4.25% 195.759 0.97% > /usr/ 51240964 12.2617 5.69% 6.94252 4.27% 223.820 1.14% > /usr/ 51240968 12.3979 5.78% 6.93575 4.29% 250.433 1.41% > /usr/ 512409616 12.3850 5.82% 6.93202 4.32% 277.247 1.44% > /usr/ 512409632 12.1949 5.82% 6.92113 4.34% 297.975 1.50% > /usr/ 512409664 11.7323 5.81% 6.87402 4.37% 314.251 1.59% I hate the Write field... it's such a lie :) it's not "multi-threaded" it's "single-threaded with (thread#-1) pauses"... ugh, that's going to get changed. James
Re: Old RAID HOWTO query?
[ Monday, March 13, 2000 ] Gregory Leblanc wrote: > What version of the RAIDtools and kernel drivers does the old > Software-RAID-HOWTO apply to? I need to make sure I've got it right. The coded checks were < 0.90, but the latest to ever show up was 0.50beta3 (kernel.org/pub/linux/daemons/raid/) James
in search of good gnuplot output
As tiotest's funnyscripts/ directory is largely (if not wholly) out-dated and broken, I've tried a first-pass perl script replacement for makeimages.sh that takes the same params as the tiobench.pl perl script and makes a gnuplot output. Currently only plots the read performance (will be fairly easy to extend later)... it's currently intentionally fairly simple until output format(s) are stable. This is mainly to solicit input on what valuable gnuplot output could look like. I'm not against surface plots, but trying to figure out good x, y, and z variable selections for them hasn't been working well for me :) Example output from this command: funnyscripts/makeimages.pl --threads 1 --threads 2 --threads 4 --threads 6 --threads 8 --threads 10 --threads 12 --threads 16 --threads 20 --threads 24 --dir /tmp --dir /src is located here: http://sublogic.com/reads.png James #!/usr/bin/perl -w #Author: James Manning <[EMAIL PROTECTED]> # This software may be used and distributed according to the terms of # the GNU General Public License, http://www.gnu.org/copyleft/gpl.html # #Description: # Perl wrapper for calling tiobench.pl and displaying results # graphically using gnuplot use strict; my $args = join(" ",@ARGV); my %input_fields; my %output_fields; my %values_present; my %data; my $dir; my $size; my $blk; my $thr; my $read; my $read_cpu; my $field; my $write; my $write_cpu; my $seek; my $seek_cpu; open(TIO,"tiobench.pl $args 2> /dev/null |") or die "failed on tiobench"; while( !~ m/^---/) {} # get rid of header stuff while(my $line = ) { $line =~ s/^\s+//g; # remove any leading whitespace ($input_fields{'dir'},$input_fields{'size'}, $input_fields{'blk'},$input_fields{'thr'}, $output_fields{'read'}, $output_fields{'read_cpu'}, $output_fields{'write'}, $output_fields{'write_cpu'}, $output_fields{'seek'}, $output_fields{'seek_cpu'} ) = split(/[\s%]+/, $line); foreach $field (keys %input_fields) { # mark values that appear $values_present{$field}{$input_fields{$field}}=1; } foreach $field (keys %output_fields) { # mark values that appear $data{$input_fields{'dir'}}{$input_fields{'thr'}}{$field} =$output_fields{$field}; } } my $gnuplot_input = "\n". "set terminal png medium color;\n". "set output \"reads.png\";\n". "set title \"Reads\";\n". "set xlabel \"Threads\";\n". "set ylabel \"MB/s\";\n". "plot "; my @gnuplot_files; foreach my $dir (sort keys %{$values_present{'dir'}}) { my $file="read_dir=$dir"; $file =~ s#/#_#g; push(@gnuplot_files,"\"$file\" with lines"); open(FILE,"> $file") or die $file; foreach my $thr (sort {$a <=> $b} keys %{$values_present{'thr'}}) { print FILE "$thr $data{$dir}{$thr}{'read'}\n"; print "DEBUG: $thr $data{$dir}{$thr}{'read'}\n"; } close(FILE); } $gnuplot_input .= join(", ",@gnuplot_files) . ";\n"; print "DEBUG: feeding gnuplot $gnuplot_input"; open(GNUPLOT,"|gnuplot") or die "could not run gnuplot"; print GNUPLOT $gnuplot_input; close(GNUPLOT);
Re: tiotest on SMP systems...
[ Saturday, March 11, 2000 ] Gregory Leblanc wrote: > I've got a dual proc SS20 that I'm using at my toy here. I'm running > tiobench/tiotest on this machine to test out the raw performance of these > disks, but I was sort of wondering what that (CPU%) number means on an SMP > machine. Does it represent XX% of the total CPU cycles available are being > used, or does it represent that XX% of the 1 CPU's cycles are being used? > Seems to me that the threading would allow it to easily split onto multiple > CPUs, but then what does the (CPU%) represent on the single threaded test? the CPU % is in terms of a single CPU. the below is on my home dual celery [root@ns1 tiotest-0.25]# ./tiobench.pl --size 16 Size is MB, BlkSz is Bytes, Read and Write are MB/sec, Seeks are Seeks/sec Dir Size BlkSz Thr# Read (CPU%) Write (CPU%) Seeks (CPU%) - -- --- - -- -- . 1640961 242.571 90.9% 6.00456 7.88% 53944.7 97.9% . 1640962 269.951 143.% 5.97565 8.21% 61718.8 138.% . 1640964 279.769 157.% 5.94349 8.04% 64585.5 156.% . 1640968 284.229 164.% 5.81558 7.81% 66145.7 165.% James
Re: raid0145-19990824-2.2.11.gz
[ Thursday, March 9, 2000 ] Arthur Erhardt wrote: > I just tried to patch a Linux 2.2.14 kernel For 2.2.14 apply http://www.redhat.com/~mingo/raid-patches/raid-2.2.14-B1
Re: patch fails
[ Thursday, March 9, 2000 ] Frank Joerdens wrote: > After trying to apply raid0145-19990824-2.2.11 to a 2.2.13 kernel > > /usr/src/linux/arch/i386/defconfig.rej > /usr/src/linux/arch/sparc64/kernel/ioctl32.c.rej > /usr/src/linux/drivers/block/ll_rw_blk.c.rej > /usr/src/linux/include/asm-ppc/md.h.rej Safe to ignore, as is the one or two you get applying to 2.2.12 > I also tried patching a 2.0.36, a 2.2.14 and a 2.2.12 kernel, all with > similar results. Don't bother with 2.0.36 For 2.2.14 apply http://www.redhat.com/~mingo/raid-patches/raid-2.2.14-B1 James
Re: how to test the performance ?
[ Thursday, March 9, 2000 ] octave klaba wrote: > I see in some emails the tables with the tests: > cpu charge, Mb/sec etc the nicely formatted tables come out of the perl script tiobench.pl in the tiotest package mirrored at http://sublogic.com/tio/ (at least until Mika gets his moving finished :) There's also bonnie at http://www.textuality.com/bonnie/ although for raid or drive testing, I'm not sure what bonnie buys you over tiobench's single-threaded test run... hmmm James
Re: question on raid
[ Thursday, March 9, 2000 ] Benny HO wrote: > I am trying to setup a linear mode to expand my drive. > > I did exactly what is said in the How-to doc. Which one? The LDP one is (checking as I write this) is outdated. http://ostenfeld.dk/~jakob/Software-RAID.HOWTO/ > Then I run " mkraid /dev/md0" > It returns > Destorying the contents of the /dev/md0 in 5 seconds.. > Handling MD device /dev/md0 > analyzing super-block > disk 0: /dev/hda6 . > disk 1: /dev/hdb1 . > > /dev/md0 Invalid argument could you dump out anything that showed up in /var/log/messages (at the end of it) or relevant things at the end of "dmesg" output? Could you also include the contents of your /proc/mdstat? Could you also include the contents of your /etc/raidtab? > I am running RedHat Linux 6.0 with kernel 2.2.5-15 I actually don't remember whether that kernel was patched or unpatched, and I've never done linear mode so I'm not sure there's a huge difference (although using old mdtools vs. new raidtools is one obvious one) James
Re: Benchmarking.. how can I get more out of my box?
[ Tuesday, March 7, 2000 ] Matthew Clark wrote: > Hey guys.. I just installed and ran iozone.. neat tool.. > > When the file size reaches 32Mb, I see a huge drop from around 129Mb/sec > (obviously caching effects) right down to 10Mb/sec... then at 64Mb it drops > to between 2.5 and 6.7 Mb/sec depending on record/block size... Could you try bonnie (textuality.com/bonnie) or tiotest (mirror available at sublogic.com/tio that includes the mmap code as 0.25)? The second opinions they offer would be interesting to see. > I have a Dual Intel PIII 500 system with 256Mb of main Memory... It has a > Hardware RAID 5 system on 5 18 Gb Seagate Barracuda drives spread over 3 LVD > SCSI channels on a Megaraid controller. I have the latest megaraid source > (1.05) from ami.com. what parameters did you use making the h/w array? (write-through vs write-back, stripe size, etc) James
Re: SW-Raid1 over network block devices
[ Monday, March 6, 2000 ] Holger Kiehl wrote: > node2: 2 x PII-350 128MB with 5 disks used as one single > SW-Raid5, kernel 2.2.14 + mingos patch could you try 2 things? 1) UP kernel 2) kernel 2.3.30 (SMP and then UP if still locks) > Is it a problem that /dev/nd1 lies on another SW-Raid? ie. Part of a raid1 > on top of a raid5. nbd's been historically flaky, with local-loopback, UP kernel situations being the only really tested scenario :) James
Re: RaidTools won't compile correctly
[ Sunday, March 5, 2000 ] Slip wrote: > And suggestions greatly appreciated! You may want to read the new Software-RAID howto at http://ostenfeld.dk/~jakob/Software-RAID.HOWTO/ Specifically section 1.2 "Requirements" Note that the current, supported raid uses raidtools-0.90 (it's in the "alpha" subdirectory from that place you got everything else). Note that it will require a kernel patch, but trust us, you'll thank us later :) James
Re: autorun
[ Saturday, March 4, 2000 ] Steve wrote: > request_module[md-personality-3]: Root fs not mounted it would appear that you'd need to build-in the raid level support instead of making it a module. Main problem being that since root's not mounted (chicken-and-egg in this case), you have nowhere to load the correct module from. Hence you'll need to rebuild the kernel and build-in the necessary support instead of having it as a module. James
Re: Problem with 2.2.x and RAID0
[ Saturday, March 4, 2000 ] Martin Schulze wrote: > I wonder why I can't get RAID0 aka striping work with 2.2.13. It only > runs with 2.0.36. old-style raid is no longer supported. You may wish to read the s/w raid howto at http://ostenfeld.dk/~jakob/Software-RAID.HOWTO/ specifically, the "requirements section" (1.2) quick summary: patch kernel, get new raidtools, create raidtab, mkraid > # mdadd -ar > /dev/sdc2: No such device > /dev/sdd2: No such device > /dev/sde2: No such device > /dev/md0: No such device > > The appropriate SCSI driver is included, /dev/sda1 can be mounted without > a problem. As you can see, the MD driver is also included, thus it should > work. it still would appear that you have no valid sd[cde]2. perhaps fdisk -l /dev/sd[cde] output so we can see the partitions on those drives? also helpful would be your raidtab (mdtab in this case) contents and /proc/mdstat output My guess would be either /dev/sd[cde] aren't valid drives (for whatever reason) or they only have a single partition. Shot in the dark, of course, as there's not enough information to make a good assessment. Good luck! James
Re: Suggestion for mkraid
[ Friday, March 3, 2000 ] James Manning wrote: > [ Friday, March 3, 2000 ] Sander Flobbe wrote: > > In my kernel I did only include the module for raid-1. Then, when I try > > to create a raid-5 system it doesn't work: > > > > Okay, okay, my fault... but a tiny little cute hint about my mistake > > from mkraid would be nice, wouldn't it? :*) > > also nice would be your raidtab contents, /proc/mdstat output, syslog > messages, kernel version, patch used, raidtools used, etc, etc, etc Wow, I *really* needed some sleep *sigh* I can't even blame the crack since I quit last week. :) really, I did.. I swear! really! Ok, there was that one time behind the garage! shut up already! Yes, better and more descriptive error messages is always a good thing. after the merge is successfully done, that'd be a good priority for making sure s/w raid is as friendly as possible for 2.4-based distros. James
Re: kernel 2.3.4X raid0 performance problems
[ Friday, March 3, 2000 ] Karl Czajkowski wrote: > > how much memory in the machine? > > 256 MB > dual 550 MHz pentium III > > I did read other larger-than-memory files in between tests to try and > avoid caching effects. barely larger than memory doesn't count. It's easily argued that 2x memory isn't even good enough either :) 3x is really about the time it gets safe. Sadly, this will remain the case until we can avoid caching altogether, something I'm hoping (hey, someone tell me if this is a pipe dream :) mmap/madvise can do for us. James
Re: kernel 2.3.4X raid0 performance problems
[ Friday, March 3, 2000 ] Karl Czajkowski wrote: > I upgraded the kernel to 2.3.47, 48, and 49 and got a performance > problem where "time cat file ... > /dev/null" for a 300 MB file shows > some scaling, but for a 600 MB file the throughput is almost identical to > a single disk. how much memory in the machine? > is there a known scheduling problem with the 2.3.4X kernel raid vs. the > 2.2.12-20 patches distributed by redhat? I need the new kernel for > ethernet patches... 2.3.4x raid merge isn't finished yet, but I'm surprised raid0 not working as well as it sounds like it should. > I also noticed that the "boot with raid" option in the kernel won't compile > properly in the 2.3.4X series. should once merge is finished. James
Re: 16/02 Raid1 Benchmark
[ Friday, March 3, 2000 ] Ricky Beam wrote: > As I understand it, the "stride" will only make a real difference for > fsck by ordering data so it's (more) evenly spread over the array. This > sounds correct and even "looks" correct when observing the array -- but > I've never bothered to look at the file system handling of striding. I'd always imagined it allowed the ext2 layer to aggregate data blocks (the number to aggregate being the stride param) before passing the blocks to the md layer, making things more efficient since the md layer wouldn't have to do the same aggregation and could simply pass down a single block. Nothing based on looking through code, just an impression. It'd be good to know, actually :) Seems like you'd ideally like ext2 to pass down the data in full-stripe sizes, but that could be asking a bit much. James
Re: Suggestion for mkraid
[ Friday, March 3, 2000 ] Sander Flobbe wrote: > In my kernel I did only include the module for raid-1. Then, when I try > to create a raid-5 system it doesn't work: > > Okay, okay, my fault... but a tiny little cute hint about my mistake > from mkraid would be nice, wouldn't it? :*) also nice would be your raidtab contents, /proc/mdstat output, syslog messages, kernel version, patch used, raidtools used, etc, etc, etc I don't get my mind-reading certification until next semester. :) James
Re: autorun
[ Friday, March 3, 2000 ] Steve Terrell wrote: > I have been using raid1 0.090-5 (kernel 2.2.14 w/ raid patch) on a > couple of RedHat 6.1 boxes for several weeks with good results. > Naturally, when I installed it on a production system, I ran into > problems. Raid1 arrays work fine - after the machine (Redhat 6.0 kernel > 2.2.14 w/patch) is up and running. However, autorun does not work even > though autodetect was compiled and the partitions are type fd. > > Anyone got a clue? Did you enable autodetection? paste the "autorun" section of the bootup log. James
Re: What program do I use for benchmarking?
[ Friday, March 3, 2000 ] bug1 wrote: > there are a few benchmark progs arround > > bonnie:old benchmark program > bonnie++ :updated bonnie to reflect modern hardware > tiotest :looks promising, still being developed > iozone :havent tried this, but www.iozone.org shows it can do pretty > graphics, and also has a long feature list. If anyone happens to know a command-line capability or version of iozone, please let me know... tons of NT benchmarking I could automate the hell out of once I find it :) > tiotest has been getting a lot of attention around here lately, so maybe > you should give it a go. Yes, please! :) tiotest sprung up specifically for s/w raid testing (though it's not specific to that, at least not yet :) and the more pounding we can do, the better. Feedback about USE_MMAP in tiotest and whether it causes any significant changes is also very desired. I'm hoping Linus will finally accept Chuck Lever's mincore() (and later, madvise()) patch, solely so we can possibly get to the point where we can efficiently benchmark without caching effects. This is currently, IMHO, the weakness in all methods of Linux i/o benchmarking... James
Re: are there archives or FAQ's?
[ Thursday, March 2, 2000 ] Derek Shaw wrote: > I've re-compiled the kernel to have md support at RAID-1 included in > ftp.fi.kernel.org/pub/linux/daemons/raid/alpha/ fetch the patch (raid0145) for kernel 2.2.11 and apply it to your kernel source (since you said 2.2.13) and ignore rejects If you decide on 2.2.14, use: http://people.redhat.com/mingo/raid-patches/raid-2.2.14-B1 Since you referenced Jakob's howto, I'll note that this is covered in section 1.2 "requirements" which you appear to have at least partially read based on the ftp location you used. :) James
Re: FW: ExtremeRAID 1100 benchmarks
[ Thursday, March 2, 2000 ] Kenneth Cornetet wrote: > I wished someone would port Bonnie (or tiotest) to NT. ActivePerl + cygwin should work fine... if not, plz report specific issues (some ifdef's on the thread stuff should be ablout it) I still have the NTiogen re-write I did, and that'll be easy enough to rip code out of. James
Re: ExtremeRAID 1100 benchmarks
[ Thursday, March 2, 2000 ] Chris Mauritz wrote: > Has anyone done any benchmarks with the Mylex ExtremeRAID 1100? I'm > planning on getting one of the 3 channel ones with 64mb cache. Initially, > it will be delivered on a dual PIII-750mhz machine with NT, but I'd like to > repurpose this as a Linux file server. It will have an external enclosure > with 8 18gig 10,000rpm IBM Deskstars and one hot spare. Can anyone hazard a > guess at the kind of performance I can expect from such an array? This is a very similar setup to the 9-disk 10krpm raid5 extremeraid 1100 benchmarks I mailed the list awhile back... search back through some archives. James
tiotest patch to add mmap() and madvise() capabilities
By default not used (ppl just have to edit the DEFINES in their Makefile) but worth getting into the tree now for later tinkering (specifically, madvise() behavior checking and diff memory copy methods). If anyone happens to have or be running a kernel with chuck lever's madvise() patch, please try with and w/o -DUSE_MADVISE. Otherwise, seeing some good read/write vs. mmap() results should be interesting (although I get the feeling I could have done the memory copy's a little better... hmmm) James diff -ru tiotest-0.24/ChangeLog tiotest-0.24.mmap/ChangeLog --- tiotest-0.24/ChangeLog Wed Feb 16 10:25:16 2000 +++ tiotest-0.24.mmap/ChangeLog Thu Mar 2 02:34:14 2000 @@ -88,3 +88,8 @@ * 0.24 - prompt to STDERR and not printing ^H s any more - minor tiobench.pl cleanup by James + +2000-03-02 James Manning <[EMAIL PROTECTED]> + + * 0.25 - add optional use of mmap()-based IO ifdef'd on USE_MMAP + - add optional use of madvise() to control kernel paging USE_MADVISE diff -ru tiotest-0.24/Makefile tiotest-0.24.mmap/Makefile --- tiotest-0.24/Makefile Fri Feb 11 18:25:33 2000 +++ tiotest-0.24.mmap/Makefile Thu Mar 2 02:39:56 2000 @@ -3,6 +3,7 @@ CC=gcc #CFLAGS=-O3 -fomit-frame-pointer -Wall CFLAGS=-O2 -Wall +#DEFINES=-DUSE_MMAP -DUSE_MADVISE DEFINES= LINK=gcc EXE=tiotest diff -ru tiotest-0.24/tiotest.c tiotest-0.24.mmap/tiotest.c --- tiotest-0.24/tiotest.c Wed Feb 16 10:25:30 2000 +++ tiotest-0.24.mmap/tiotest.c Thu Mar 2 02:39:45 2000 @@ -19,7 +19,7 @@ #include "tiotest.h" -static const char* versionStr = "tiotest v0.24 (C) Mika Kuoppala <[EMAIL PROTECTED]>"; +static const char* versionStr = "tiotest v0.25 (C) Mika Kuoppala <[EMAIL PROTECTED]>"; /* This is global for easier usage. If you put changing data @@ -513,23 +513,46 @@ off_t blocks=(d->fileSizeInMBytes*MBYTE)/d->blockSize; off_t i; +#ifdef USE_MMAP +off_t bytesize=blocks*d->blockSize; /* truncates down to BS multiple */ +void *file_loc; +#endif + fd = open(d->fileName, O_RDWR | O_CREAT | O_TRUNC, 0600 ); if(fd == -1) perror("Error opening file"); +#ifdef USE_MMAP +ftruncate(fd,bytesize); /* pre-allocate space */ +file_loc=mmap(NULL,bytesize,PROT_READ|PROT_WRITE,MAP_SHARED,fd,0); +if(file_loc == MAP_FAILED) + perror("Error mmap()ing file"); +#ifdef USE_MADVISE +/* madvise(file_loc,bytesize,MADV_DONTNEED); */ +madvise(file_loc,bytesize,MADV_RANDOM); +#endif +#endif + timer_start( &(d->writeTimings) ); for(i = 0; i < blocks; i++) { +#ifdef USE_MMAP +memcpy(file_loc + i * d->blockSize,buf,d->blockSize); +#else if( write( fd, buf, d->blockSize ) != d->blockSize ) { perror("Error writing to file"); break; } - +#endif d->blocksWrite++; } +#ifdef USE_MMAP +munmap(file_loc,bytesize); +#endif + fsync(fd); close(fd); @@ -547,26 +570,44 @@ intfd; off_t blocks=(d->fileSizeInMBytes*MBYTE)/d->blockSize; off_t i; +#ifdef USE_MMAP +off_t bytesize=blocks*d->blockSize; /* truncates down to BS multiple */ +void *file_loc; +#endif fd = open(d->fileName, O_RDONLY); if(fd == -1) perror("Error opening file"); +#ifdef USE_MMAP +file_loc=mmap(NULL,bytesize,PROT_READ,MAP_SHARED,fd,0); +#ifdef USE_MADVISE +/* madvise(file_loc,bytesize,MADV_DONTNEED); */ +madvise(file_loc,bytesize,MADV_RANDOM); +#endif +#endif + timer_start( &(d->readTimings) ); for(i = 0; i < blocks; i++) { +#ifdef USE_MMAP +memcpy(buf,file_loc + i * d->blockSize,d->blockSize); +#else if( read( fd, buf, d->blockSize ) != d->blockSize ) { perror("Error read from file"); break; } - +#endif d->blocksRead++; } timer_stop( &(d->readTimings) ); +#ifdef MMAP +munmap(file_loc,bytesize); +#endif close(fd); return 0; diff -ru tiotest-0.24/tiotest.h tiotest-0.24.mmap/tiotest.h --- tiotest-0.24/tiotest.h Fri Feb 4 14:40:27 2000 +++ tiotest-0.24.mmap/tiotest.h Wed Mar 1 14:19:14 2000 @@ -14,6 +14,10 @@ #include #endif +#ifdef USE_MMAP +#include +#endif + #define KBYTE 1024 #define MBYTE (1024*KBYTE)
Re: your mail
[ Wednesday, March 1, 2000 ] Christian Robottom Reis wrote: > James, when run tiotest with a size too small for the number of threads don't do that. (what'd be the purpose?) I'll add a bondary check later... but really, I'm not going to get into the habit of checking all possible inputs against parasitic cases. don't do that. James
Re: Testing script
[ Wednesday, March 1, 2000 ] Christian Robottom Reis wrote: [snip] > # time we need to sleep before resync finishes - empirical? > snooze=5m [snip] > sleep $snooze # so the raid1 can sync in peace FWIW, If it's the only thing resync'ing you should be able to do: while grep resync /proc/mdstat > /dev/null; do sleep 10; done you can chain two grep's together or do an egrep pattern if you want to isolate on the particular device (don't have mdstat output during a resync handy at the moment) James
Re: Benchmark 1 [Mylex DAC960PG / 2.2.12-20 / P3]
[ Wednesday, March 1, 2000 ] Christian Robottom Reis wrote: > On Wed, 1 Mar 2000, James Manning wrote: > > per-char doesn't matter (one of the reasons I hate ppl using bonnie, > > besides the single-threaded-ness). Considering the queueing and scat/gat > > Why not? Because usual disk operations are done block by block? that and because per-char stresses the OS and stdio implementation far more than drive itself there was a little rant about it awhile back on lkml or here James
Re: tiotest, --numruns
[ Wednesday, March 1, 2000 ] Christian Robottom Reis wrote: > I've seen a lot of variation on various runs of tiotest using the same > setup - even in single-user mode. Is this expected, and do you know why it > happens? Is it just the effect of the buffer cache, or do we avoid using > it? we don't avoid using it currently. Since I can find neither a 2.2 or 2.3 that has working i386 madvise(), it could be awhile :) > What's a decent --numruns to use, taking into evidence such > variation? I've noticed if I use more than one I get worse numbers in > general - this is ok? if you don't trust numruns > 1, don't use it :) It may be worth watching "vmstat 1" output during a run just so you can get an idea of the memory/caching interaction that's going on. I'd like to believe that higher numruns further reduces the effect of memory,,, for numruns=1,2,4 my numbers come out pretty close Dir Size BlkSz Thr# Read (CPU%) Write (CPU%) Seeks (CPU%) - -- --- - -- -- .51240964 6.81976 9.47% 6.95052 10.3% 164.596 1.81% .51240964 6.72370 8.35% 6.88223 10.3% 165.602 1.84% .51240964 6.69172 7.37% 6.83409 10.5% 169.500 1.89% James
Re: Benchmark 1 [Mylex DAC960PG / 2.2.12-20 / P3]
[ Wednesday, March 1, 2000 ] Ricky Beam wrote: > > ---Sequential Output ---Sequential Input-- --Random-- > > -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- > > MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU > > 256 5451 78.7 10035 8.5 4000 7.3 3975 55.3 18765 11.3 262.8 3.9 > > That's a lot of CPU being used for a hardware RAID device. per-char doesn't matter (one of the reasons I hate ppl using bonnie, besides the single-threaded-ness). Considering the queueing and scat/gat the driver is probably trying (not to mention caching, esp. in the read case), 7.3-11.3% seems acceptable for the block stuff. I need to check to see if madvise() has been backported to 2.2.x, as MADV_RANDOM may help cut down or eliminate memory caching effects... it'd be nice to get (approx) the same numbers from 100MB and 1000MB test runs, regardless of the amount of memory in the machines :) James
Re: What version of the raidtools and other patches do I need?
[ Wednesday, March 1, 2000 ] Brian Kress wrote: > Either use your current kernel with that patch or get 2.2.14 > and grab the patch at http://www.redhat.com/~mingo/raid. Or for a working url :) http://www.redhat.com/~mingo/raid-patches/raid-2.2.14-B1 James
Re: RaidZone software raid
[ Tuesday, February 29, 2000 ] Hector Herrera wrote: > Has anyone on this list used any of Raidzone's products? > > http://www.raidzone.com/ Already brought up fairly recently... check some archives such as mail-archive.com or similar James
Re: persistent superblock in HOWTO / raidtools
[ Tuesday, February 29, 2000 ] Brian Lavender wrote: > mammoth:/# mkraid /dev/md0 > unrecognized option peristent-superblock Try using a spell checker :) James
Re: Benchmark 1 [Mylex DAC960PG / 2.2.12-20 / P3]
[ Tuesday, February 29, 2000 ] Christian Robottom Reis wrote: > /proc/rd/ relevant information: > > * DAC960 RAID Driver Version 2.2.4 of 23 August 1999 * > Copyright 1998-1999 by Leonard N. Zubkoff <[EMAIL PROTECTED]> > Configuring Mylex DAC960PG PCI RAID Controller > Firmware Version: 4.06-0-08, Channels: 1, Memory Size: 4MB Try updating your firmware, it may help Configuring Mylex DAC1164P PCI RAID Controller Firmware Version: 5.07-0-79, Channels: 2, Memory Size: 64MB James
Re: tiotest 0.21/0.24
[ Tuesday, February 29, 2000 ] Christian Robottom Reis wrote: > James, I've run a whole truckload of benchmarks on raid1 with varying > chunksizes on three different kernels, and on a plain disk. I'm about to > publish some of the stuff, but I'm wondering very hard why is it that the > readbalancing test showed _awful_ numbers on tiotest 0.21 and great > numbers on 0.24 - any idea? Just have a look: I'm not going to actually wade through these numbers... just far too many to really deal with :) tiotest.c has only changed cosmetically tiobench.pl's only non-cosmetic change was in stat calculation to allow for multiple runs in an efficient and harmonic (literally) way. Unfortuantely for this case, it becomes the same calculation as before (first number divided by second in tiotest output) for num_runs == 1. Tell ya what, pick out an isolated case which is heavily reproducible, print out the tiobench output, then print out the tiotest output. James
Re: Chunk and Stripe for RAID1
[ Tuesday, February 29, 2000 ] Christian Robottom Reis wrote: > I've got the simple scripts I used to do the benchmarks here and if > somebody wants to have a look, feel free. go ahead and mail them to the list as attachments. Might make for more scripts to shove into tiotest/funnyscripts/ James
Re: strange syslog messages about overlapping physical units
[ Tuesday, February 29, 2000 ] Christian Robottom Reis wrote: > On Mon, 14 Feb 2000, Peter Pregler wrote: > > All is fine but during reconstruction I get a few syslog-messages that I > > simply cannot believe are true. The message in question are: > > > > Feb 12 11:31:52 kludge kernel: md: serializing resync, md8 has overlapping > > physical units with md9! > > Just means both md partitions have component partitions on the same drive > - isn't this in the faq, Jakob? They have to be serialized because the > bandwidth for sync is rather limited and it'd be thrashing to let the > resync go by in parallel. Or restated "you wouldn't want to bother with all the wasted seeks between the two sections of disk, so you serialize the resyncs" James
Re: Adaptec RAID
[ Tuesday, February 29, 2000 ] Andrew G Milne wrote: > I have an Adaptec 4-channel raid controller. I have just got the > drivers from Dell for this card and it turns out that they have been > statically compiled for a specific version of the kernel. I need to use > the raid array as a boot device (which the driver allows) but I don't > have a boot diskette (or CD!) that has this version of the kernel. I > have tried using the version that I have got, but the driver doesn't > load. - which kernel(s) do you have - which kernel does it require - distribution? I'd guess it's an RH kernel avail off of redhat.com, but matching kernel ver, CONFIG_SMP, CONFIG_MODVERSIONS might be enough to get a module that should at least work if insmod -f'd James
Re: Cookbook way to set up raid1
[ Monday, February 28, 2000 ] Brian Lavender wrote: > The software-RAID Howto is very _unclear_. Did you read this one? The LDP one is ancient (long story) http://ostenfeld.dk/~jakob/Software-RAID.HOWTO/
Re: set block_size question after a power failure
[ Sunday, February 27, 2000 ] [EMAIL PROTECTED] wrote: > e2fsk -f -b 32768 /dev/md0 to repair using the superblocks wouldn't this be pointing the fsck at a non-superblock? Perhaps not, but in my exp. superblocks are typically on 2**n+1 Did you have some indication your primary superblock was corrupted? James