from:"James Manning"

Re: FAQ update

2000-08-05 Thread James Manning


[Luca Berra]
> >The patches for 2.2.14 and later kernels are at
> >http://people.redhat.com/mingo/raid-patches/. Use the right patch for
> >your kernel, these patches haven't worked on other kernel revisions
> >yet.
> 
> i'd add: dont use netscape to fetch patches from mingo's site, it hurts
> use lynx/wget/curl/lftp

Yes, *please* *please* *please*
-- 
James Manning <[EMAIL PROTECTED]>
GPG Key fingerprint = B913 2FBD 14A9 CE18 B2B7  9C8E A0BF B026 EEBB F6E4

Re: FAQ

2000-08-03 Thread James Manning


[Luca Berra]
> from the info page from gnu tar 1.13.17:
> 
> `--bzip2'
> `-I'
>  This option tells `tar' to read or write archives through `bzip2'.

As mentioned previously, this is a distro-specific hack.  I have it in
my tar as well, but trusting it to be part of core GNU tar just because
it works on your system is silly.

version 1.13 is the latest at ftp://ftp.gnu.org/pub/gnu/tar/
and specifically mentions the bzip2 situation in its NEWS file:

+++
* An interim GNU tar alpha had new --bzip2 and --ending-file options,
  but they have been removed to maintain compatibility with paxutils.
  Please try --use=bzip2 instead of --bzip2.
+++

Checking the ChangeLog shows bzip2 support added 1999-02-01 (in the form
of -y, --bzip2, and --bunzip2) and then removed 1999-06-16

In any case, it certainly is true that we can trust -z to be
around on any standard Linux install, and as such it is the
correct answer to this thread.
-- 
James Manning <[EMAIL PROTECTED]>
GPG Key fingerprint = B913 2FBD 14A9 CE18 B2B7  9C8E A0BF B026 EEBB F6E4

Re: FAQ

2000-08-03 Thread James Manning


[Marc Mutz]
> >2.4. How do I apply the patch to a kernel that I just downloaded from
> >ftp.kernel.org?
> > 
> >Put the downloaded kernel in /usr/src. Change to this directory, and
> >move any directory called linux to something else. Then, type tar
> >-Ixvf kernel-2.2.16.tar.bz2, replacing kernel-2.2.16.tar.bz2 with your
> >kernel. Then cd to /usr/src/linux, and run patch -p1 < raid-2.2.16-A0.
> >Then compile the kernel as usual.
> 
> Your tar is too customized to be in a FAQ.

there is no bzip2 standard in gnu tar, so let's be intelligent and avoid
the issue by going with the .gz tarball as a recommendation.  -z is
standard.

Also, none of the tarballs will start with "kernel-" but "linux-"
anyway, so that needs fixing.  Also, I'd add "/path/to/" before the
raid in the patch command, since otherwise we'd need to tell them to
move the patch over to that directory (pedantic, yes, but still)

oh, and "move any directory called linux to something else" seems to
miss the possibility of a symlink, where renaming the symlink would
be kind of pointless.  Whether tar would just kill the symlink at
extract time anyway is worth a check.
-- 
James Manning <[EMAIL PROTECTED]>
GPG Key fingerprint = B913 2FBD 14A9 CE18 B2B7  9C8E A0BF B026 EEBB F6E4

Re: OT: best cross-OS filesystem

2000-08-02 Thread James Manning


[Edward Schernau]
> Sorry to waste bandwidth, but I'm looking at a way for better
> cross-OS performance on my "shared" partition - are there ext2fs
> drivers for NT somewhere, or maybe hpfs drivers for NT?  I have some
> very large directories with 100's of files, and I want to be able to
> get in and around them easily...

FAT32 appears to be the dominate cross-OS filesystem of choice,
combining long-filename support with native read-write capability
in Linux, 95/98, NT/2000

James
-- 
James Manning <[EMAIL PROTECTED]>
GPG Key fingerprint = B913 2FBD 14A9 CE18 B2B7  9C8E A0BF B026 EEBB F6E4

Re: Determining a failed device

2000-07-25 Thread James Manning


[Kirk Patton]
> The status should be:
> md0 : active raid5 sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1] sda1[0] 
> 71681024 blocks level 5, 256k chunk, algorithm 0 [5/5] [U]

5 active, 1 standby (6 raid disks total)

> The status is:
> md0 : active raid5 sdf1[4] sde1[4](F) sdd1[3] sdc1[2] sdb1[1] sda1[0] 
> 71681024 blocks level 5, 256k chunk, algorithm 0 [5/5] [U]

5 active, 1 failed (6 total).  This is a snapshot after the rebuild has
already occurred (or the drive that failed was the spare, but that's
unlikely given typical ordering conventions)

> I noted the (F) by sde1.  Does this stand for
> failed?  Is there any references to the types of
> errors that will be reported in the syslog or
> /proc/mdstat?

yes, F is failed.

> Personalities : [raid5] 
> read_ahead 1024 sectors
> md0 : active raid5 sdg1[6] sdf1[5] sde1[4] sdd1[3]
> sdc1[2] sdb1[1](F) sda1[0] 106653696 blocks level
> 5, 256k chunk, algorithm 0 [7/6] [U_U]
> unused devices: 
> 
> Reading this status from /proc/mdstat, I am
> thinking that the raid is running in degraded mode
> with "sdb1" as the failed drive.  The [7/6],  does
> that mean that there are 7 devices and only 6 are
> currently running?

yup, that's degraded.  You'll want to raidhotremove the sdb1 and
raidhotadd a new partition (possibly sdb1 after that drive gets replaced,
depending on your controller and other factors) and it'll rebuild onto
the new drive.

James
-- 
James Manning <[EMAIL PROTECTED]>
GPG Key fingerprint = B913 2FBD 14A9 CE18 B2B7  9C8E A0BF B026 EEBB F6E4

Re: 2.4.0 autodetect patch

2000-07-25 Thread James Manning


[Nick Kay]
> Better still would be a pointer to the linux-raid archives - I can't
> find them even if they do exist.

It still cracks me up that "linux-raid archive" into google returns
such a long list yet ppl swear that can't find them.  Interesting.

Anyway, everyone's got their fav, but mine is:
http://www.mail-archive.com/linux-raid@vger.rutgers.edu/
-- 
James Manning <[EMAIL PROTECTED]>
GPG Key fingerprint = B913 2FBD 14A9 CE18 B2B7  9C8E A0BF B026 EEBB F6E4

Re: raid newbe!

2000-07-22 Thread James Manning


[Fredrik Lindström]
> I've been searching for a RAID howto or something like that
> What I'm after is the software raid in linux

Go to http://www.linuxdoc.org
Under HOWTOs, look for "Software RAID"

http://www.linuxdoc.org/HOWTO/Software-RAID-HOWTO.html

James
-- 
James Manning <[EMAIL PROTECTED]>
GPG Key fingerprint = B913 2FBD 14A9 CE18 B2B7  9C8E A0BF B026 EEBB F6E4

Re: DPT PM3334

2000-07-22 Thread James Manning


This is what I get for not having coffee before reading my email.

> > I have been trying to get Red Hat 6.2 to install
> > on my DPT PM3334 raid controller, but
> > I just read somewhere that Red Hat does not
> > support installing the boot partition
> > onto the raid array.

In all cases I've seen, any device that there is a module to support
can have that module shoved into a initrd just fine.  Any good raid
controller only shows a logical disk to the OS anyway, so usu. hardware
raid situations are much easier for booting off of raid than s/w ones.

James, who *still* needs to go downstairs and get some C8-H10-N4-O2
-- 
James Manning <[EMAIL PROTECTED]>
GPG Key fingerprint = B913 2FBD 14A9 CE18 B2B7  9C8E A0BF B026 EEBB F6E4

Re: DPT PM3334

2000-07-22 Thread James Manning


[Souvigna Phrakonkham]
> Hello, has anyone put a boot partition on the
> "raid" array drives?  If so which
> distribution of linux?

You can make it work on any distro, but afaik the only installer that
currently has "native support" is RH 6.2's gui installer.

> I have been trying to get Red Hat 6.2 to install
> on my DPT PM3334 raid controller, but
> I just read somewhere that Red Hat does not
> support installing the boot partition
> onto the raid array.

It does, for RAID-1 (mirroring).  (no striping involved, and their lilo
has been patched to support the raid-1 device).  Using Disk Druid,
it's pretty straightforward to make a couple of partitions and "make
raid device", so you should be ok.

The steps to reverse a bootable raid onto an existing system are a bit
tedious, but covered in the Boot+Root RAID Howto and Software RAID Howto,
both at linuxdoc.org

James
-- 
James Manning <[EMAIL PROTECTED]>
GPG Key fingerprint = B913 2FBD 14A9 CE18 B2B7  9C8E A0BF B026 EEBB F6E4

Re: Raid1: How to verify that mirroring is functioning

2000-07-19 Thread James Manning


[root]
> Hi,

Hello.

> I've created mirrored striped arrays (Raid10) and am not confident that
> my first striped set is in fact being mirrored on my second striped set.

First question: did you make backups? :)

> When the mirrored mdX devices are created, cat /proc/mdstat does show
> that re-synching is taking place.  However, if I mount an mdX that is
> part of my second striped set, I see NO files, just a lost+found
> directory. Hmm, I didn't mount as read-only.  It this significant?

Any chance we could see your /proc/mdstat output?

> What techniques can I use to verify that the second striped set is being
> mirrored?  Is there a raidtool to force resynching?

mkraid'ing md10-14 will need to write to the ends of md0-9, possibly
corrupting the filesystems already in place (with the blessed data
being on md0-4, it would appear).

Although it's not broken out as a separate section, the method for getting
a mirror made of already in-place data isn't extremely nice, but it has
been effective for many in the past.  It's covered as "Method 2" at:

http://www.linuxdoc.org/HOWTO/Software-RAID-HOWTO-4.html#ss4.12

If you have an ext2 resizer that you trust to shrink the fs enough for
the raid superblock, you can try that and avoid the step of copying over
data manually.  Not recommended, of course, but it's a possibility.

> If, perchance, an mdX on the first-striped set has a problem, will the
> mirrored device kick in and re-synch the striped mdX with the problem? 
> When this happens (as I'm sure it probably will at some point), how will
> I know that it is occurring?  I am guessing that the first striped set
> will be out of operation until it is repaired by re-synching with the
> mirrored set.
> 
> How can mirroring be effectively used & monitored?

The major problem here is that once you create (via the failed-disk method)
the raid10, you *need* to start mounting the md10-14 devices.  Manually
dealing with the underlying md0-9 devices isn't supported after that point.

It boils down to the fact that raid1 is "write to md10, mirror the
writes across md0 and md5" and not "the raid1 module should catch all
writes to md0 and automatically mirror them to md5".  You have to use the
raid1 mdX device you created or you best-case lose raid1 functionality,
worst-case lose data.

> fstab file:
> 
> /dev/md1/local  ext2defaults 1 2
> /dev/md0/optext2defaults 1 2
> /dev/md4/tmpext2defaults 1 2
> /dev/md2/usrext2defaults 1 2
> /dev/md3/varext2defaults 1 2

After the "method 2" (failed-disk) steps to get the mirrored/striped
raid10's up and running, you'll need to change these by "adding 10" to
each (md11, md10, md14, md12, md13) so you're using the raid10 devices
and not an underlying raid0 device.

HTH, HAND

James
-- 
James Manning <[EMAIL PROTECTED]>
GPG Key fingerprint = B913 2FBD 14A9 CE18 B2B7  9C8E A0BF B026 EEBB F6E4

 PGP signature

Re: general question.

2000-07-19 Thread James Manning


[Roman Seibel]
> comp:~/ # mkraid /etc/raidtab
> mkraid version 0.36.4

http://www.linuxdoc.org/HOWTO/Software-RAID-HOWTO.html

Specifically, the "requirements" section 1.2

http://www.linuxdoc.org/HOWTO/Software-RAID-HOWTO-1.html#ss1.2

HTH,

James
-- 
James Manning <[EMAIL PROTECTED]>
GPG Key fingerprint = B913 2FBD 14A9 CE18 B2B7  9C8E A0BF B026 EEBB F6E4

 PGP signature

Re: 2.2.16, "device too small (0 kB)"

2000-07-17 Thread James Manning


[Marc Haber]
> I am trying to build a RAID 1 with two disks on a new system. Linux is
> Debian potato, kernel 2.2.16 patched with raid-2.2.16-A0, raidtools
> built from raidtools-dangerous-0.90.2116.tar.gz.

So far so good.

> |   Device BootStart   EndBlocks   Id  System
> |/dev/hda738  2501  19792048+  fd  Linux raid autodetect
> 
> |   Device BootStart   EndBlocks   Id  System
> |/dev/hdb738  2501  19792048+  fd  Linux raid autodetect

Looks fine

> |haber@gwen[7/58]:~$ cat /etc/raidtab
> |raiddev /dev/md0
> |raid-level  1
> |nr-raid-disks   2
> |nr-spare-disks  0
> |chunk-size  4
> |persistent-superblock 1
> |device  /dev/hda7
> |raid-disk   0
> |device  /dev/hdb7
> |raid-disk   1

Also good.

> However, when I finally try to build the RAID, this is what happens:
> |haber@gwen[8/59]:~$ sudo mkraid /dev/md0
> |handling MD device /dev/md0
> |analyzing super-block
> |/dev/hda7: device too small (0kB)
> |mkraid: aborted, see the syslog and /proc/mdstat for potential clues.
> |haber@gwen[9/60]:~$ cat /proc/mdstat
> |Personalities :
> |read_ahead not set
> |unused devices: 
> |haber@gwen[10/61]:~$
> 
> Nothing is written to syslog.

Being a non-primary partition shouldn't be a problem (there was the
autodetection issue iirc, but that shouldn't matter here)

The only time I've been device too small was when I was accessing
a device that didn't have a proper /dev entry.  the fdisk -l probably
only needed /dev/hda to be valid, but for the mkraid to succeed
/dev/hda7 will need to be valid (3,7).  Not likely, but that's the
only time I saw it.
-- 
James Manning <[EMAIL PROTECTED]>
GPG Key fingerprint = B913 2FBD 14A9 CE18 B2B7  9C8E A0BF B026 EEBB F6E4

 PGP signature

Re: Packages needed

2000-07-16 Thread James Manning


[Micah Anderson]
> According to the RAID HOWTO
> (www.linuxdoc.org/HOWTO/Root-RAID-HOWTO-2.html) you are supposed to have
> the following packages:
[snip]
> So, is this HOWTO not useful to me? If that is true - I haven't been able
> to find a HOWTO elsewhere that addresses the ".90 raidtools and
> accompanying kernel patch to the ...2.2x...series kernels".

http://www.linuxdoc.org/HOWTO/Software-RAID-HOWTO.html

It would be nice if the Root-RAID-HOWTO desc. included a link
-- 
James Manning <[EMAIL PROTECTED]>
GPG Key fingerprint = B913 2FBD 14A9 CE18 B2B7  9C8E A0BF B026 EEBB F6E4

 PGP signature

Re: where is the archive?

2000-07-05 Thread James Manning


[Sandro Dentella]
>i wanted to browse the mailing -list archive before bothering jou w/ my
>problems but I coudn't find any: where are they?

http://www.mail-archive.com/linux-raid@vger.rutgers.edu/

Also, read the HOWTO

http://www.linuxdoc.org/HOWTO/Software-RAID-HOWTO.html
-- 
James Manning <[EMAIL PROTECTED]>
GPG Key fingerprint = B913 2FBD 14A9 CE18 B2B7  9C8E A0BF B026 EEBB F6E4

Re: performance statistics for RAID?

2000-06-27 Thread James Manning


[Gregory Leblanc]
> Is there any chance of keeping track of these with software RAID?

AFAIK, sct's patch to give sar-like data out of /proc/partitions gives
all of the above stats and more... neat patch :)  The user-space tool
should be in the same dir.  And, FWIW, I get asked about how people can
get a "sar" for Linux *very* often by the SCO people here at work.

James

Re: Reiser to the occasion

2000-06-27 Thread James Manning


[Henry J. Cobb]
> but if you've got a journaling filesystem, wouldn't you want to expose the
> raw disks to it so it can choose to put the journals on different disks
> than the files?

Funny, since sct/ext3 is the only one that appears to be pushing to keep
alive the possibility of journaling to other devices (nvram for one,
which is definitely a good idea).  In one sense, creating an external
dependency for the recovering of your data can be a Bad Thing.

> This would not only help with performance, but it would also make recovery
> as simple as using one of the surviving journal copies and applying that
> against the last full backup of the main file system.  (I.e. you lose 10
> disks out of your 12 disk "array" and wind up not losing a single byte of
> data.)

journals aren't *nearly* that deep.  journal transaction entries can get
overwritten (circular buffer) as soon as the full transaction has been
committed to disk.  It does *not* keep all transactions around since
your last full backup (how would it even know? :)

Journaling != RAID != LVM != Backups.  They all serve their own purpose,
and invariably trying to use one to cover the tasks of others *will*
bite you eventually (as we have seen on this list multiple times)

James

Re: Easy way to convert RAID5 to RAID0?

2000-06-27 Thread James Manning


[[EMAIL PROTECTED]]
> Yes, I know that.  Unfortunately, I'm working on an extremely
> insert-heavy application (over 100 million records per day).  I would
> really like ReiserFS (due to the large file size as well as for the
> journaling).  I don't see how RAID5 can meet my needs.

FWIW, ReiserFS won't get you much unless there are large numbers of
files involved.  I run s/w raid0 over h/w raid5 with ext2 specifically
because it's faster for my situation with relatively low file counts
(about 100 files per directory).

James

Re: Easy way to convert RAID5 to RAID0?

2000-06-27 Thread James Manning


[[EMAIL PROTECTED]]
> I find that my RAID5 array is just too slow for my DB application.  I
> have a large number of DB files on this array.  I would like to
> convert to RAID0, and I can back up my files, but I was wondering if
> there is a way to convert without reformatting?

Not currently, although it may be worth reconsidering a conversion from
5 -> 0 if you can alleviate your performance problems with other methods
(chunk size, -R stride=, reiserfs, more memory, etc)

Just a thought, although for anything OLTP-ish you're going to be so
insert- and update-heavy that I'm sure raid5's going to be less than
ideal for some performance requirements...  Keep in mind that you won't
be able to survive through a disk failure like you can now, though
(I know you already know this, just want to rehash :)

James

Re: raid 0 problems after kernel upgrade

2000-06-27 Thread James Manning


[blair christensen]
> hello,
> rh 6.2 on a dell poweredge 4400 box.  it was running 2.2.14-5 with a
> raid 0 array.   i upgraded the kernel to 2.2.16 and i am now having
> problems with the raid device (/dev/md0).

You didn't patch your 2.2.16 (www.redhat.com/~mingo/raid-patches)

> when i try to mount the device, i get:

check /proc/mdstat before trying mount's or tune2fs or other things.
It should show that you don't have an active md0, so subsequent
attempts to use md0 will certainly fail.

HTH, HAND

James

Re: Raid 5. Lost 2 drives.

2000-06-22 Thread James Manning


[m.allan noah]
> > The howto says try mkraid --force.  With a 2 drive (2/4) will I lose
> > everything.
> 
> why do you want to make a two drive raid5? that makes no sense. use raid1.

If you *read* his message you'll notice that he has 4 drives in the
array and lost 2 of them (2 still active). :)

> yes- if there is data already on the drive, running mkraid is a pretty sure
> way to destroy the filesystem, since part of the file system will be
> overwritten.

Incorrect.  if it was a s/w raid device already, then nothing gets touched
except the raid super-block that was already there.  Resync may occur,
but there are mkraid options to keep that from happening too.

James

Re: Raid-Failure, please help

2000-06-20 Thread James Manning


[Jochen Haeberle]
> does not recreate automatically... The problem mentioned striking me
> most is "md0 has overlapping physical units with md2"... this does
> not sound very good to me...

That's informative about resync operations.  It is not an error.

> May we run fsck on the md devices???

sure, as long as the devices are active (check /proc/mdstat)

James

Re: 2.2.16 RAID patch

2000-06-14 Thread James Manning


[Matthew DeFoor]
> I hate to bother the list with this, but...I have been unable to get
> Redhat 6.1/2.2.16+raid-2.2.16-A0 working with Root RAID1.
> 
> image=/boot/bzImage
> label=linux
> initrd=/boot/initrd-2.2.16.RAID.img
> read-only
> root=/dev/md0
> 
> request_module[md-personality-3]: Root fs not mounted
> do_md_run() returned -22

re-make your initrd and include --with=raid1 (just did the same thing
at our installfest last weekend :)

James

Re: bonnie++ for RAID5 performance statistics

2000-06-09 Thread James Manning


[Gregory Leblanc]
> Sounds good, James, but Darren said that his machine had 256MB of ram.  I
> wouldn't have mentioned it, except that it wasn't using enough, I think.

it tries to stat /proc/kcore currently.  no procfs and it'll fail to
get a good number... I've thought about other approaches, too, but since
this is just a fall-back mechanism when the person doesn't specify a size
(like they should), I don't give it much worry.  Patches always welcome,
though, of course :)

> a side note, I think that 3x would be a better number than 4, but maybe it's
> just me.  I've got multiple machines with 256MB of ram, but only 1GB or 2GB
> RAID sets.  4x ram would overflow the smaller RAID sets.

I've thought about parsing df output of the $dir and clamping on that,
but I haven't gotten around to it yet.

Keep in mind, this is still all fall-back... you should be passing the
right value in the first place :)

James

Re: bonnie++ for RAID5 performance statistics

2000-06-09 Thread James Manning


[Gregory Leblanc]
> > [root@bod tiobench-0.3.1]# ./tiobench.pl --dir /raid5
> > No size specified, using 200 MB
> > Size is MB, BlkSz is Bytes, Read, Write, and Seeks are MB/sec
> 
> Try making the size at least double that of ram.

Actually, I do exactly that, clamping at 200MB and 2000MB currently.
Next ver will up it to 4xRAM but probably leave the clamps as is.
(note: only clamps when size not specified... it always trusts the user)

James

Re: Linux raid 5 recovery

2000-05-30 Thread James Manning


[Wishart, Aaron M. (James Tower)]
> I have a raid5 file system consisting of 8, 9-gig quantum scsi drives (scsi
> id 0-6, 8).  The drive with the scsi id of 1 failed.  I replaced the drive
> and ran "raidhotadd /dev/scb /dev/md0"  It appeared to run so I left for the
> weekend.  When I came in this morning the syslogd was using 75% of the cpu
> and outputting "kernel: raid5: md0: unrecoverable error I/O error for block
> #" from some kind of loop it apparently failed around 4:00am Saturday (
> I started the restore at about 4:00 Friday afternoon).  

 - you'd typically do something like "raidhotadd /dev/md0 /dev/sdb1"
   instead, after replacing the disk, making sure it came back as sdb
   (as per kernel log), fdisk'ing to make a partition with type fd (no,
   not 100% necessary, but almost always a good idea) then doing the
   raidhotadd.

 - After the raidhotadd you'd check /proc/mdstat to confirm the array
   is reconstructing on the new drive (partition, really).

Aside from those two (which I don't think is really the issue, but worth
clarifying), I'd say there's the possibility that another drive gave an
error (maybe a soft error, the raid code doesn't really differentiate and
can get quite picky even if the underlying drive successfully remapped
the sector) without the resync completed (resync's seem to take much
longer than they should, but maybe that's just me... I mirror entire
drives in 20 minutes, but resync's seem to take over a dozen hours)

Good luck,

James

Re: Forcing Rebuild/Reconstrution

2000-05-30 Thread James Manning


[Peter Hircock]
> d) raidhotadd /dev/md2 /dev/hdc3
> Don't have raidhot add.

raidhotadd is a symlink to raidstart that gets created when you do
the "make install".  Might wanna check you've done that and then
check that the /sbin directory is in your path (or wherever you
installed the raidtools)

James

Re: HELP with autodetection on booting

2000-05-29 Thread James Manning


[Gregory Leblanc]
> I started seeing this when I blew away my RAID0 arrays and put RAID1 arrays
> on my home machine.  I suspect that this is cause by RedHat putting
> something in the initscripts to start the RAID arrays AND the RAID slices
> being set to type fd (RAID autodetect), but I haven't been able to confirm
> this.  And since I just totaled my RH install, it may be a couple of weeks
> before I get back to look some more.  

Just to confirm :)

/etc/rc.d/rc.sysinit will attempt to activate any /etc/raidtab entries
that aren't listed as active already in /proc/mdstat.  This can certainly
be a nuisance in some cases, but I guess they feel it works well in
most cases (and they may be right).  Certainly can cut down the need for
partition types of "fd", although it would appear to be more important to
keep your raidtab aligned with reality (although that's a good practice
anyway since we may need it for recovery later on).

James

Re: HELP with autodetection on booting

2000-05-29 Thread James Manning


[Jieming Wang]
> autorun ...
> considering sdb1 ...
>   adding sdb1 ...
>   adding sda1 ...
> created md0
> bind
> bind
> running: 
> now!
> sdb1's event counter: 000a
> sda1's event counter: 000a

Looks like a couple of partitions with type fd, looking great for
autostart by the raid code.

> kmod: failed to exec /sbin/modprobe -s -k md-personality-3, errno = 2
> do_md_run() returned -22

Doh!  More likely than not, you'll want to build-in the necessary raid
levels into the kernel.  Otherwise, you end up in a chicken-and-egg
problem (possibly, depending on fs layout) where you need to load a
module from a filesystem that you can't get to without the module loaded.

James

Re: Any distro with automated raid setup?

2000-05-29 Thread James Manning


[Slip]
>I'm wondering if anyone has run into a distribution of linux that
>has software raid-util's pre-packaged into it, or available in a third
>party package. I'v been trying to setup software raid with three 2.1G
>SCSI drives for quite a while now and am simply looking for an easier
>sollution. Any pointers/suggestions?

FWIW, the Red Hat 6.2 installer is the only one I know of that's
software-raid aware enough to create them at install time and
even boot from them (raid1 only at the moment).  Red Hat 6.2 is
also the only distro (AFAIK at least) that has a lilo patched
to understand software raid devices (although you can certainly
apply the patch yourself or install RH 6.2's lilo package).

James

Re: HELP!!! Broken raid0

2000-05-28 Thread James Manning

[Matthew Burke]
> On Sun, 28 May 2000, James Manning wrote:
> > [Matthew Burke]
> > > e2fsck 1.18, 11-nov-1999 for EXT2 FS 0.5b, 95/08/09
> > > e2fsck: Attempt to read block from filesystem resulted in short read while
> > > trying to open /dev/md1
> > > Could this be a zero-length partition?
> 
> mdstat:
> 
> Personalities : [raid0] 
> read_ahead 1024 sectors
> md0 : active raid0 hdc1[1] hda3[0] 1606272 blocks 64k chunks
> unused devices: 

No active /dev/md1, so e2fsck failing is normal.

> hda: ST36531A, 6204MB w/128kB Cache, CHS=790/255/63, (U)DMA
> hdb: IBM-DJNA-351520, 14664MB w/430kB Cache, CHS=1869/255/63, (U)DMA
> hdc: ST36531A, 6204MB w/128kB Cache, CHS=13446/15/63, (U)DMA
> 
> *** edited note from matt - the CHS values have always been different for
> some unknown reason...

AFAIK, you simply have one drive in LBA mode and not the other.
in my exp, just a bios setting difference but you're under 8GB anyway
so I'm not sure it really makes a diff.

> autodetecting RAID arrays
> (read) hda3's sb offset: 787072 [events: 0063]
> (read) hda4's sb offset: 5470016 [events: 005e]
> (read) hdc1's sb offset: 819200 [events: 0063]
> (read) hdc3's sb offset: 5470016 [events: ]
> md: invalid superblock checksum on hdc3

Sure makes it look like hdc3 has some major issues.  It has a partition
type of fd, but invalid raid superblock.  Makes me wonder if e2fsck
didn't get run on hdc3 itself and it "fixed" that last part (hope not
since it may have done some real superblock damage).  hdc itself looks
ok since hdc3 doesn't seem to have any problems, so I don't think it's
an actual drive problem.  Unfortunately, since it appears that the raid
superblock (at a minimum) is broken on hdc3, the only thing I can think
to recommend is

 - mkraid --force /dev/md1 (rewrites raid superblocks)
 - try to raidstart /dev/md1 (and hope that the real data is ok)
 - mount -o ro /dev/md1 /mnt  (see if it looks ok)

There is the chance that the partition table got slightly corrupted
and hdc3's entry has an incorrect value (unlikely, though, since
the size matches hda4).  Make sure your raidtab matches md1's actual
devices before running the --force, of course.

Note that "normally" the superblock checksum is fine and the update
counter is only a few off from the most recent, so I want to stress
that if there is something strange wrong (like a partition table
screwup), the writing of the raid superblocks can corrupt data.

If this all makes you nervous, feel free to see what others may
recommend... I've certainly never dealt with this exact kind of situation
before (array recovery attempts for a raid0 array :)

James

Re: HELP!!! Broken raid0

2000-05-28 Thread James Manning


[Matthew Burke]
> e2fsck 1.18, 11-nov-1999 for EXT2 FS 0.5b, 95/08/09
> e2fsck: Attempt to read block from filesystem resulted in short read while
> trying to open /dev/md1
> Could this be a zero-length partition?
> 
> /dev/md1 is not mounted, but it is properly set up in /etc/raidtab
> 
> raidstart /dev/md1 produeces no error message, but fails to do anything.

Could you paste /proc/mdstat?  If the arrays aren't active, fsck won't
be able to do anything on them.  If the arrays are indeed inactive,
some syslog entries that relate to it (autostart'ing, I'd imagine)
could be helpful as well.

James

Re: Problems creating RAID-1 on Linux 2.2.15/Sparc64

2000-05-28 Thread James Manning


[Ion Badulescu]
> In article 
>[EMAIL PROTECTED]> you 
>wrote:
> 
> > I am having trouble using Linux RAID on a Sun Ultra1 running
> > 2.2.15.
> 
> You need an additional patch, just plain vanilla 2.2.15 + raid-0.90 won't
> do on a sparc. Red Hat have it in their 2.2.14-12 source rpm, but I'm
> attaching it here, for convenience.

Actually, I don't believe he's applied the 0.90 patch on top of 2.2.15,
given his /proc/mdstat:

> > /proc/mdstat remains constant with the following:
> > 
> > Personalities : [1 linear] [2 raid0] [3 raid1] [4 raid5]
> > read_ahead not set
> > md0 : inactive
> > md1 : inactive
> > md2 : inactive
> > md3 : inactive

So he may want to start out with 
http://people.redhat.com/mingo/raid-patches/raid-2.2.15-A0
first.

James

Re: raid5 disk failure

2000-05-28 Thread James Manning


[Jakob Østergaard]
> > Set up a raidtab entry **WITH GREAT CARE** specifying the minimal set as
> > above, with the oldest partitions `raid-failed'. Now create the device.
> > This will write a new set of consistent PSBs.
> 
> Correct.

s/raid-failed/failed-disk/ as per section 6.1

http://www.linuxdoc.org/HOWTO/Software-RAID-HOWTO-6.html#ss6.1

James

Re: Will kernel 2.4 include latest RAID patches?

2000-05-24 Thread James Manning


[Marco Shaw]
> The 2.4 kernel tree itself will not, but Linux distributions will.  RedHat
> has been patching their products since 6.1, so I'm thinking SuSE isn't far
> behind.

Incorrect.  As of 2.3.99-pre8, the merge is (mostly) done, with just a
few straglers left to get cleaned up.  Once my 8-way Xeon finishes the
find | xargs -P 8 bzip2 -9 I've got running, I'm gonna check KNI support.

James

Re: md0 won't let go... (dmesg dump...)

2000-05-17 Thread James Manning


[Harry Zink]
> While I appreciate the patch/diff provided by James Manning, I am extremely
> weary of applying anything to a system that I don't fully understand -
> particularly if it is suffixed by "Who knows..." (shiver).

I hadn't had a chance to test it... this one (attached) works (I had
forgotten to update the index commands in the hd[i-j] and hd[k-l])

> Now, I just need to make sure all devices are attached as Master devices, on
> their own controller port, and then figure out what minor and major to set
> them at... *ANY* help in allowing me to better understand how that's done,
> or in actually doing this will be appreciated.

anything on an "even" device (hdb, hdd, hdf, hdh, hdj, hdl, etc) is a
slave. the "odd" ones (hda, hdc, etc) are masters

> Alright, maybe it's oversimplified, but I grok that part (that the kernel
> needs the proper device files, and that I don't have the device files, and
> thus need to create them.

Actually, the kernel doesn't need the /dev files... user-space programs
(fdisk, for instance, possibly mkraid too, not sure) need them as an
interface to the devices in the kernel... devfs may make this picture
clearer down the road... or muddier :)

> Thanks, and thanks to James Manning as well for finally tracking down what
> the core of this problem is.

MAKEDEV is historically bad about keeping up with devices.txt, so
it's fairly common... those mknod's I gave last time should work too

> Is there some utility that will quickly and easily create /dev/ files and
> provides qualified questions to assist in properly creating /dev/ files?

MAKEDEV is a decent shell script, although it's just glorified
mknod wrapping when it comes down to it :)

reading devices.txt and a mknod --help  is about all that can be done
for understanding the /dev entries... as to major/minor and why they're
still around, "historical cruft" is about it for now.

James


--- /dev/MAKEDEVThu Mar  2 16:35:20 2000
+++ /tmp/MAKEDEVWed May 17 13:33:35 2000
@@ -180,7 +180,7 @@
do
case "$1" in
mem|tty|ttyp|cua|cub)   ;;
-   hd) (for d in a b c d e f g h ; do
+   hd) (for d in a b c d e f g h i j k l; do
echo -n hd$d " "
 done) ; echo
;;
@@ -188,6 +188,8 @@
ide1)   echo hdc hdd ;;
ide2)   echo hde hdf ;;
ide3)   echo hdg hdh ;;
+   ide4)   echo hdi hdj ;;
+   ide5)   echo hdk hdl ;;
sd) echo sda sdb sdc sdd ;;
sr) echo scd0 ;;
st) echo st0 ;;
@@ -621,6 +623,28 @@
major=`Major ide3 34` || continue
unit=`suffix $arg hd`
base=`index gh $unit`
+   base=`math $base \* 64`
+   makedev hd$unit b $major $base $disk
+   for part in 1 2 3 4 5 6 7 8 # 9 10 11 12 13 14 15 16 17 18 19 20
+   do
+   makedev hd$unit$part b $major `expr $base + $part` $disk
+   done
+   ;;
+   hd[i-j])
+   major=`Major ide4 56` || continue
+   unit=`suffix $arg hd`
+   base=`index ij $unit`
+   base=`math $base \* 64`
+   makedev hd$unit b $major $base $disk
+   for part in 1 2 3 4 5 6 7 8 # 9 10 11 12 13 14 15 16 17 18 19 20
+   do
+   makedev hd$unit$part b $major `expr $base + $part` $disk
+   done
+   ;;
+   hd[k-l])
+   major=`Major ide5 57` || continue
+   unit=`suffix $arg hd`
+   base=`index kl $unit`
base=`math $base \* 64`
makedev hd$unit b $major $base $disk
for part in 1 2 3 4 5 6 7 8 # 9 10 11 12 13 14 15 16 17 18 19 20

Re: md0 won't let go... (dmesg dump...)

2000-05-17 Thread James Manning


[Harry Zink]
> Not sure what this will help, except confirm again that these volumes aren't
> accessible, which was my question to start with.

question is "why?", answer is "no appropriate /dev entries"

> [root@gate src]# ls -l /dev/hdj1
> ls: /dev/hdj1: No such file or directory
> [root@gate src]# ls -l /dev/hdj
> ls: /dev/hdj: No such file or directory
> [root@gate src]# ls -l /dev/hdk
> ls: /dev/hdk: No such file or directory
> [root@gate src]# ls -l /dev/hdk1
> ls: /dev/hdk1: No such file or directory

That's why you can't fdisk (just as Gregory has pointed out before)... get
those created (see previous note as per MAKEDEV)... default setup is 4 IDE
controllers (ide[0-3]) which correspond do the 8 IDE devices hd[a-h]...

Judging by /usr/src/linux/Documentation/devices.txt, I'd say the major's
for these new devices should be 56 and 57, so my guess would be:

mknod /dev/hdj b 56 64
mknod /dev/hdj1 b 56 65
mknod /dev/hdk b 57 0
mknod /dev/hdk1 b 57 1

attached is what might be a working MAKEDEV patch... who knows.

Bleah,

James


--- /dev/MAKEDEVThu Mar  2 16:35:20 2000
+++ /tmp/MAKEDEVWed May 17 11:17:28 2000
@@ -188,6 +188,8 @@
ide1)   echo hdc hdd ;;
ide2)   echo hde hdf ;;
ide3)   echo hdg hdh ;;
+   ide4)   echo hdi hdj ;;
+   ide5)   echo hdk hdl ;;
sd) echo sda sdb sdc sdd ;;
sr) echo scd0 ;;
st) echo st0 ;;
@@ -619,6 +621,28 @@
;;
hd[g-h])
major=`Major ide3 34` || continue
+   unit=`suffix $arg hd`
+   base=`index gh $unit`
+   base=`math $base \* 64`
+   makedev hd$unit b $major $base $disk
+   for part in 1 2 3 4 5 6 7 8 # 9 10 11 12 13 14 15 16 17 18 19 20
+   do
+   makedev hd$unit$part b $major `expr $base + $part` $disk
+   done
+   ;;
+   hd[i-j])
+   major=`Major ide4 56` || continue
+   unit=`suffix $arg hd`
+   base=`index gh $unit`
+   base=`math $base \* 64`
+   makedev hd$unit b $major $base $disk
+   for part in 1 2 3 4 5 6 7 8 # 9 10 11 12 13 14 15 16 17 18 19 20
+   do
+   makedev hd$unit$part b $major `expr $base + $part` $disk
+   done
+   ;;
+   hd[k-l])
+   major=`Major ide5 57` || continue
unit=`suffix $arg hd`
base=`index gh $unit`
base=`math $base \* 64`

Re: md0 won't let go... (dmesg dump...)

2000-05-17 Thread James Manning


[Harry Zink]
>Doing fdisk /dev/hdf works just fine.
>Doing fdisk /dev/hdg or /dev/hdk results in the old 'unable to open
>hdj/hdk'

ls -l /dev/hd[gk]* ... you make need a later MAKEDEV (or edit yours)
to create all the necessary files

>Alright, try turning off the RAID again ... raidstop -all or raidstop
>/dev/md0.
>This generates the following:
>raidstop /dev/md0
>/dev/md0: Device or resource busy

mounted filesystem... clear processes using it and umount it 
(show df output too)

>So, this time it won't let go of hdj and hdk (I moved the drives
>around during the rebuild), which *DO* exist, and whose partition ID I
>can't change (even though it is currently blank/unformatted) becaused
>I can't use fdisk...
> 
>md0 : active raid0 hdh1[1] hdg1[0] 19806976 blocks 16k chunks

md is using hdh1 and hdg1 ... it's not using hdj or hdk

If you wish them (hdh1, hdg1) to not get run automatically, fdisk them
and set the type back to 83 from fd (the autorun consideration proves
all these partitions are still "fd")

These are all the same things hashed over before, so no, I don't really
expect this email to have any real consequence. *sigh*

James

Re: md0 won't let go... (dmesg dump...)

2000-05-11 Thread James Manning


[Tommy]
> When reading through this, my first impulse is to say that /dev/hdl isn't
> correct.  When I recently built a raid5 using 3 promise cards, I found
> that in spite of the kernel detecting hdk hdm and hdo, these devices were
> NOT built in /dev.  In fact, I had to dig into the ide header file to even
> find the proper MAJOR node settings for the devices.

I'd think that, but he's still not put out the /proc/mdstat I asked for
multiple times, and the dmesg output he showed didn't have hdl involved
in md0 at all.  I don't honestly believe hdl, if it even exists, is even
remotely involved in s/w raid.  I don't see dmesg output that reports
an hdl (/dev entries not affecting the kernel, obviously), either.

James

Re: md0 won't let go... (dmesg dump...)

2000-05-11 Thread James Manning


[Harry Zink]
> autorun ...
> considering hdh1 ...
>   adding hdh1 ...
>   adding hdg1 ...
> created md0

so hdh and hdg certainly both have partitions and bothare set to type fd

fdisk to /dev/hdl would seem to be failing because there is no hdl device

if you're trying to "free" hdg and/or hdh, fdisk their type to 83
instead of fd and they won't autostart.  If you're trying to
do something with hdl (if it exists), md isn't the problem.

James

Re: md0 won't let go...

2000-05-10 Thread James Manning


[Harry Zink]
> [root@gate Backup]# raidstop /dev/md0
> /dev/md0: Device or resource busy
> 
> (This is normal, the fs is shared by atalk. I disable atalk)
> 
> [root@gate Backup]# raidstop /dev/md0
> /dev/md0: Device or resource busy
> 
> (Now this is no longer normal. No services or anything else is using the
> partition. I made sure no one is logged in to that partition. Still, the
> same error.)

Based on the above, I'd say your md0 is still mounted as a filesystem.
umount it, or if you're having real problems getting it umounted add
noauto to fstab options for the fs and the next boot shouldn't mount it
and raidstop will work fine.

If it's not mounted, and you're getting the above errors, please send
df output.

James

Re: md0 won't let go...

2000-05-10 Thread James Manning

[Harry Zink]
> on 5/10/00 2:30 PM, [EMAIL PROTECTED] at [EMAIL PROTECTED]
> wrote:
> > You probably need to do a 'raidstop' on md0.  Then, maybe you can
> > fdisk it?
> 
> Been there, done that.
> Makes no difference. It just very persistently holds on to these drives.

Are you claiming that /proc/mdstat has the md0 active both before and
after running raidstop /dev/md0?  Just want to clarify.

James

Re: What is the "standard" way to delete RAID devices?

2000-05-09 Thread James Manning


[Dave Meythaler]
> I have looked through the Software RAID howto, the Bootable RAID howto, the
> docs that come with raidtools 0.90 and the man pages and I haven't been able
> to find any way to delete a raid device once it has been created.

since a raid device is just a virtual block device over other real
devices, it is a little vague what you mean by "delete".  But, going by
what I think you mean, you'll want to:

 - rename /etc/raidtab (in case your distro has initscripts which try
   to activate raidtab entries that aren't active in /proc/mdstat)
 - raidstop the array(s) (check /proc/mdstat)
 - if their partition types are "fd", make them "83" or another
   appropriate value so your autodetect doesn't try to find it (although
   if the superblock isn't valid it won't start an array anyway)
 - mke2fs (or whatever else) for giving new roles to your now-unused
   partitions/drives

> I'm trying to get rid of a raid device (RAID 0 or 1) which was created using
> the "persistent-superblock" option on Red Hat 6.2 (kernel source 2.2.14-12).

The persistent superblock isn't persistent in that manner :)  Once the
array is raidstop'd, you can mke2fs the partition immediately (I do
just that all the time checking performance between disks and a s/w raid
of them)

> Is there some kind of command/tool to do this that I haven't stumbled
> across?  It would be nice if the howto could say something on this topic.

There could be... it'd be small since the above is about it, but
it's Jakob's call.

James

Re: System lockup during raidhotadd

2000-05-08 Thread James Manning


[Ian Morgan]
> I can raidhotremove the (simulated-)faulty disk, and then physically remove
> it. Next, I put the disk back in physically. I then want to run raidhotadd to
> add the disk back into the array and begin reconstruction.
> 
> Problem is, when I run raidhotadd, the system totally locks up solid. I've
> tried giving it time to come back to life, but nothing happens even after
> several minutes, and the system is so dead that the software watchdog is
> also toast.

In my experience, any drive manipulation (in terms of what's attached,
what's seen by the kernel, etc) that locks up the machine has been
strictly a device driver problem.  Assuming this is SCSI, it may
help to do the add-single-device/remove-single-device commands as
per drivers/scsi/scsi.c lines 2389 and 2447 respectively (2.2.15 src).
If the initial detachment didn't propogate up the device removal through
the driver, the reattachment may have caused some problems (creating
data structures already there and populated, scribbling over valid
values... who knows).  Just a guess.

> kernel: 2.2.16pre2 SMP

reproducible on 2.2.15 proper?

> raid:   mingo's raid-2.2.15-A0
> tools:  raidtools-19990824-0.90
> 
> Is this a known problem? Am I using the right procedure to replace a faulty
> disk? Would a raidstop/raidstart work? Isn't there a way to replace a drive
> without taking the array down? The HOWTO is not very detailed in this area
> of reconstruction. It makes it sound like this should all be a no-brainer.

James

Re: lilo: Sorry, don't know how to handle device 0x0905

2000-04-30 Thread James Manning


[Martin Munt]
>   Sorry, don't know how to handle device 0x0905

You can avoid the lilo.conf tricks and just use a normal one (avoiding
partition=, disk=, etc) if you used a lilo patch with the raid1 support
(lilo.raid1) written by Doug Ledford (thanks Doug!).  This list's archives
have it, or you can simply fetch the lilo package out of RH 6.1 or 6.2
(alien/rpm2cpio to your distro as needed)

This is specific to s/w raid1 since other raid levels don't have the
kernel contiguous on a physical disk.

James

Re: can't locate module block-major-22

2000-04-26 Thread James Manning


[Jason Lin]
> After my raid-1 is up and running I shutdown the
> machine and took out one hard disk.(the one without
> Linux installed.) Just to see how it behaves.
> During reboot it drops to single user mode due to RAID
> device error.
> 
> "raidstart /dev/md0"

raidstart? eww :)

> modprobe: can't locate module block-major-22
> /dev/md0: invalid argument

Since the first drive (raid-disk 0) is gone, AFAIK you have to
get autostart working by doing partition type fd for hd[ac]7 and
enabling autostart in the kernel block device section.

The raidstart approach (as per Ingo's post of maybe a week ago)
will fail if the first disk is unavailable.  Thankfully, there's
not much reason to avoid autostart these days.

> raiddev/dev/md0
> raid-level 1
> nr-raid-disks  2
> nr-spare-disks 0
> chunk-size 4
> persistent-superblock  1
> 
>   device  /dev/hdc7
>   raid-disk   0
> 
>   device  /dev/hda7
>  raid-disk   1
> 
> 
> raiddev/dev/md1
> raid-level 1
> nr-raid-disks  2
> nr-spare-disks 0
> chunk-size 4
> persistent-superblock  1  

/dev/md1 with no disks defined?  Guess it doesn't matter since the
operations are being doing on md0, but it's strange to see the extra
(apparently useless?) stanza there.

James

Re: celeron vs k6-2

2000-04-25 Thread James Manning


[Seth Vidal]
>  I did some tests comparing a k6-2 500 vs a celeron 400 - on a raid5
> system - found some interesting results
> 
> Raid5 write performance of the celeron is almost 50% better than the k6-2.

Can you report the xor calibration results when booting them?

> Is this b/c of mmx or b/c of the FPU?

FPU should never get involved (except the FPU registers getting used
during MMX operations).

As per Greg's report of the K6-2 having MMX instructions, remember
that a chip having instructions doesn't mean they get used.  Again,
this is something that the xor calibrations should help show, though.

MTRR could certainly be another source of additional performance, but I
haven't dealt with the K6-2 in any capacity so I don't even know whether
it has that capability (although I haven't personally heard of anything
not based on the P6 core using MTRR)

> I used tiobench in sizes of > than 3X my memory size on both systems -
> memory and drives of both systems were identical.

If possible, let the resync's finish before testing... this can cause a
huge amount of variance (that I've seen in my testing).  speed-limit down
to 0 doesn't appear to help, either (although the additional seeks to
get back to the "data" area from the currently resyncing stripes could
be the base cause)

When looking from a certain realistic POV, it'd be hard to believe that
even a P5 couldn't keep up with the necessary XOR operations... is
there anything else on the system(s) fighting for CPU time?

James

Re: drive XOR cmd for parity generation

2000-04-21 Thread James Manning


[Bill BAO]
> is there anybody doing the parity generation by using the 
> drive XOR cmd (XDWRITE, XDREAD, XDPWRITE) ?
> 
> we will start this kind of work in Linux raid, 
> want to know anybody else is also doing the same thing.
> we're looking for cooperation.

When I last reviewed FC-AL, I have to admit that the benefits of the
new SCSI commands XDWRITE and XPWRITE (along with BUILD and REBUILD)
fascinated me. I can't (at the moment) see this going into the Linux
s/w raid, though, mainly because it's so FC-specific (in my experience)
and would appear to violate the abstraction layer that the raid code
can exist at now.

I'm also not sure how much (if any) it buys you when the raid5 can be
dispersed over multiple controllers, multiple PCI busses, etc.

Since you've (obviously :) thought and considered this more than I, could
I talk you into a brief explanation of what (besides the bus trasfers
4->2, h/w raid controller actions 6->1) and how this can help s/w raid?

It'll also give you a great chance to alleviate any worries and quell
any issues before they get brought up :)

Thanks,

James

[PATCH] 2.2.14-B1 bug in file raid5.c, line 659

2000-04-20 Thread James Manning


Summary: raid5_error needs to handle the first scsi error from a device and
 do the necessary action, but silently return on subsequent failures.

- 3 h/w raid0's in a s/w raid5
- initial resync isn't finished (not important)
- scsi error passed up takes out one of the devices

bug triggered is when raid5_error is called passing in a device (sde1)
that doesn't match against "disk->dev == dev && disk->operational"
(mainly because the disk->operational was already set to 0 13 seconds
previously when the first scsi error was passed back and sde1 matched)

Since multiple scsi errors getting passed back from the same failure
seems valid (multiple commands had been sent, and each will fail in turn),
we should simply handle the first one and have raid5_error exit quietly
on the later ones (re-doing the spare code execution could possibly even
cause big problems for multiple available spares).  Patch attached.

Personalities : [raid5] 
read_ahead 1024 sectors
md0 : active raid5 sde1[2](F) sdd1[1] sdc1[0] 177718016 blocks level 5, 4k chunk, 
algorithm 0 [3/2] [UU_]
unused devices: 

log attached.

James


--- linux/drivers/block/raid5.c.origThu Apr 20 11:27:37 2000
+++ linux/drivers/block/raid5.c Thu Apr 20 11:32:16 2000
@@ -611,23 +611,29 @@
PRINTK(("raid5_error called\n"));
conf->resync_parity = 0;
for (i = 0, disk = conf->disks; i < conf->raid_disks; i++, disk++) {
-   if (disk->dev == dev && disk->operational) {
-   disk->operational = 0;
-   mark_disk_faulty(sb->disks+disk->number);
-   mark_disk_nonsync(sb->disks+disk->number);
-   mark_disk_inactive(sb->disks+disk->number);
-   sb->active_disks--;
-   sb->working_disks--;
-   sb->failed_disks++;
-   mddev->sb_dirty = 1;
-   conf->working_disks--;
-   conf->failed_disks++;
-   md_wakeup_thread(conf->thread);
-   printk (KERN_ALERT
-   "raid5: Disk failure on %s, disabling device."
-   " Operation continuing on %d devices\n",
-   partition_name (dev), conf->working_disks);
-   return -EIO;
+   /* Did we find the device with the error? */
+   if (disk->dev == dev) {
+   /* Did we handle its failure already? */
+   if (disk->operational) {
+   disk->operational = 0;
+   mark_disk_faulty(sb->disks+disk->number);
+   mark_disk_nonsync(sb->disks+disk->number);
+   mark_disk_inactive(sb->disks+disk->number);
+   sb->active_disks--;
+   sb->working_disks--;
+   sb->failed_disks++;
+   mddev->sb_dirty = 1;
+   conf->working_disks--;
+   conf->failed_disks++;
+   md_wakeup_thread(conf->thread);
+   printk (KERN_ALERT
+   "raid5: Disk failure on %s, disabling device."
+   " Operation continuing on %d devices\n",
+   partition_name (dev), conf->working_disks);
+   return -EIO;
+   }
+   /* Don't do anything for failures past the first */
+   return 0;
}
}
/*


Apr 19 16:02:41 rts-test2 kernel: SCSI disk error : host 3 channel 0 id 2 lun 0 return 
code = 800 
Apr 19 16:02:41 rts-test2 kernel: [valid=0] Info fld=0x0, Current sd08:41: sense key 
None 
Apr 19 16:02:41 rts-test2 kernel: scsidisk I/O error: dev 08:41, sector 9296408 
Apr 19 16:02:41 rts-test2 kernel: interrupting MD-thread pid 2807 
Apr 19 16:02:41 rts-test2 kernel: raid5: parity resync was not fully finished, 
restarting next time. 
Apr 19 16:02:41 rts-test2 kernel: raid5: Disk failure on sde1, disabling device. 
Operation continuing on 2 devices 
Apr 19 16:02:41 rts-test2 kernel: md: recovery thread got woken up ... 
Apr 19 16:02:41 rts-test2 kernel: md0: no spare disk to reconstruct array! -- 
continuing in degraded mode 
Apr 19 16:02:41 rts-test2 kernel: md: recovery thread finished ... 
Apr 19 16:02:41 rts-test2 kernel: md: updating md0 RAID superblock on device 
Apr 19 16:02:41 rts-test2 kernel: (skipping faulty sde1 ) 
Apr 19 16:02:41 rts-test2 kernel: sdd1 [events: 0002](write) sdd1's sb offset: 
88859008 
Apr 19 16:02:41 rts-test2 kernel: sdc1 [events: 0002](write) sdc1's sb offset: 
88859008 
Apr 19 16:02:41 rts-test2 kernel: . 
Apr 19 16:02:41 rts-test2 kernel: raid5: restarting stripe

Re: adaptec 2940u2w hangups

2000-04-19 Thread James Manning

Ok, normally I'd not bother with this kind of message, but Brian (Haymore)
has been both nice and helpful in my experience, so I'm going to do a
little sticking up for him since he's being uselessly railed on :)

Note that I specifically hate flaming that doesn't get taken off-list,
but as I correct factual error(s), I believe this is still valid for
linux-raid.  With that said, on with the show. :)

[The coolest guy you know]
> "Brian D. Haymore" wrote:
> > U2W can actually be LVD as well.  My Mylex eXtremeRAID 1164 card is U2W
> > and LVD so just saying U2W is for sure LVD or SE is wrong.  Read the
> > manual or read the specs on the manufactures web site.

Same message *I* was about to send about my DAC-1164P's too :)

> Pardon me for just saying "the U2W" when I meant the entire "2940U2W". 

This didn't matter.  What you *specifically* said in message
<[EMAIL PROTECTED]> was

> > LVD is for the new U160 protocol

and "LVD is for the new U160 protocol" is clearly a board-independent
(and factually incorrect) statement. (How incorrect?  See below)

> And to be fair, you are talking about a card about 10 times more
> expensive than the one being discussed in this thread.  

Don't see what price has to do with fairness here.  True the original
thread is about the 2940U2W (not that it ends up mattering, see
below), but you were responding (initially) to a message that was
much more SCSI-generic (active/passive termination WRT the terminator
[EMAIL PROTECTED] had bought at a computer store)... but, I digress.

> The "Adaptec 2940U2W" does not specifically support LVD like the
> "Mylex Extreme RAID 1100 Ultra2 Wide LVD SCSI PCI RAID Controller".

Glad you cleared that up... Can you correct Adaptec?
I guess they don't know the hardware they build :)

http://www.adaptec.com/support/faqs/aha2940u2whardware.html#1

Q: What is the SCSI Card 2940U2W?

A: The SCSI Card 2940U2W (or AHA-2940U2W) is the latest in the line
   of Adaptec PCI host adapters. It has the latest SCSI Ultra2 technology
   which uses Low Voltage Differential (LVD) circuitry designed into
   the CMOS to provide a bandwidth that is up to twice the current Ultra
   speeds and with cable lengths up to 25 meters.

They get it "wrong" in other places, too, like listing the 2940U2W
under the "Low Voltage Differential / Ultra2 PCI SCSI" section
at http://www.adaptec.com/support/files/drivers.html
(Don't they know that "LVD is for the new U160 protocol"?)

> Adaptec also does not specifically support Linux the way Mylex does.

Maybe you talk to Ledford about what level of interaction he has with
Adaptec engineers and rethink this statement :)

HTH, HAND,

(wow, that *was* therapeutic!)

James

Re: Combining RAID 0 and RAID 1

2000-04-16 Thread James Manning


[Gregory Leblanc]
> > Recovery is a tad simpler with raid1 done at the lower level simply
> > because none of the md device ever "dies", just one falls 
> > into degraded
> > and you can skip an mkraid and let normal recovery take over. 
> >  Of course,
> > that leaves the raid1 read balancing algorithm (arguably the 
> > weak point in
> > the read performance of 0+1 or 1+0) running in two places 
> > instead of one.
> 
> Could you elaborate a little?  Are you talking about the default 0.90 code,
> or patched with Mika's brilliant patch?  Theoretically, RAID1+RAID0 should
> be extreemly fast for reads, and only a bit slower for writes, assuming that
> you're not saturating the bus. 

Mika's patch is a straightforward one that improves small, random (ie
seek-heavy) reads well.  I haven't seen it (in my experience) improve
large sequential reads to the point of raid0 (just in my testing), but
it's an issue Mika and I have hashed over many other times, and it's
not worth banging over again on this list.

Thankfully, it's now a largely moot issue in the cases I need as
madvise(MADV_SEQUENTIAL) is around so I can get async forward page-in's
(the main reason I don't care about seq raid1 read perf much anymore,
and why I added the mmap/madvise code to tiobench)

James

Re: Combining RAID 0 and RAID 1

2000-04-16 Thread James Manning


[Werner Reisberger]
> I am wondering if there is a possibility to use RAID 0 and RAID 1 together,
> i. e. mirroring two RAID 0 devices?

Absolutely.  The most common setup appears to be:
drives 1+2: md0 (raid0)
drives 3+4: md1 (raid0)
md0+md1:md2 (raid1)

> Two general questions:
> 
>  - Are there any instructions for the new raidtools what to do in cases
>of disk or power failures? I only found partial outdated hints in the old
>HOWTO.

the new howto should cover it well (now in the LDP, at
http://linuxdoc.org/HOWTO/Software-RAID-HOWTO.html), but for the above
scenario, the failing drive should take down the appropriate md device
(md0 or md1) and then the md2 device should fall into degraded mode.

Regular recovery techniques (sections 5 and 6 cover them well) to get
the supporting raid0 device's drive replaced and the device re-mkraid'ed,
then raidhotadd to bring back md2.

Recovery is a tad simpler with raid1 done at the lower level simply
because none of the md device ever "dies", just one falls into degraded
and you can skip an mkraid and let normal recovery take over.  Of course,
that leaves the raid1 read balancing algorithm (arguably the weak point in
the read performance of 0+1 or 1+0) running in two places instead of one.

Probably a common enough request to warrant a howto subsection :)

>  - Is there an archive for this mailing list? If not I could set up one.

http://www.mail-archive.com/linux-raid@vger.rutgers.edu/

James

Re: RAID1: how to control which disk is syn'ed to which.

2000-04-13 Thread James Manning


[Jason Lin]
> After installing RedHat6.1 on /dev/hda
> I added 2nd hard disk, /dev/hdc, which has same
> capacity as /dev/hda.
> Then a RAID1 device, /dev/md0, was created with
> /dev/hda2 and /dev/hdc2 as the constituent partitions.
> (/dev/hda2 contains data for  /home)
> 
> Is there a way to control which disk is syn'ed to
> which?

"failed-disk" directive.

As an example, you can check out "Method 2" of the "Root filesystem
on RAID" section (although in your case it's /home so life is a little
easier) at http://www.linuxdoc.org/HOWTO/Software-RAID-HOWTO-4.html#ss4.12

Jakob: how do you feel about a section that covers the "mirroring of
   currently existing filesystem" case?  mirroring /home when a
   spare drive becomes available could be quite useful :)

James

Re: a raid configuration & questions about battery

2000-04-09 Thread James Manning


[David Konerding]
> 4 drives 36gig Ultra 2 SCSI (or LVD? or Ultra 3?) (3 active drives & 1 hot
> spare)

Make sure to consider 4-drive raid1 as well

> From poking around the kernel, and reading some stuff on web sites, and
> visiting the vendor websites, it seems like the less expensive cards
> ($500-1000) typically don't have a battery backup for the cache on the card.
> I was thinking, however, that
> the UPS makes the cache battery unecessary.  Is this a valid belief?  Or is
> there a situation where having the battery backup
> is a good idea?

I personally trust my UPS just fine.  battery-backed write cache is
(IMHO) more a check-mark on Draconian TPC-type auditing to ensure
recovery capability.

> Also, exactly what will having SAF/TE support on the card and the drive
> enclosure gain me?  Any pointers to SAF/TE documentation online would be
> appreciated.

http://www.safte.org/

> Will I save a lot of $$$ by eliminating the requirement for hot-swap and
> SAF/TE on the rackmount enclosure?

Probably not, and when things go bad, life is much easier with a nice
SAF-TE compliant enclosure to work with.

James

Re: IO-APIC interrupts (was System Hangs -- Which Is...)

2000-03-28 Thread James Manning


[[EMAIL PROTECTED]]
> I'm in the same boat.  How do you enable IO-APIC support in the
> kernel?

CONFIG_SMP implies it, and recent 2.3.x (may have been backported)
will allow a UP kernel to use IO-APIC (Ingo's work) although I
haven't seen a machine (personally) where that's helpful :)

> What is MTRR and how is it enabled?

CONFIG_MTRR=y

Snipped from Documentation/Configure.help:

MTRR control and configuration
CONFIG_MTRR
  On Intel P6 family processors (Pentium Pro, Pentium II and later)
  the Memory Type Range Registers (MTRRs) may be used to control
  processor access to memory ranges. This is most useful when you have
  a video (VGA) card on a PCI or AGP bus. Enabling write-combining
  allows bus write transfers to be combined into a larger transfer
  before bursting over the PCI/AGP bus. This can increase performance
  of image write operations 2.5 times or more. This option creates a
  /proc/mtrr file which may be used to manipulate your
  MTRRs. Typically the X server should use this. This should have a
  reasonably generic interface so that similar control registers on
  other processors can be easily supported.

  The Cyrix 6x86, 6x86MX and M II processors have Address Range
  Registers (ARRs) which provide a similar functionality to MTRRs. For
  these, the ARRs are used to emulate the MTRRs, which means that it
  makes sense to say Y here for these processors as well.

  The AMD K6-2 (stepping 8 and above) and K6-3 processors have two
  MTRRs. The Centaur C6 (WinChip) has 8 MCRs, allowing
  write-combining. All of these processors are supported by this code.

  The Centaur C6 (WinChip) has 8 MCRs, allowing write-combining. These
  are supported.

  Saying Y here also fixes a problem with buggy SMP BIOSes which only
  set the MTRRs for the boot CPU and not the secondary CPUs. This can
  lead to all sorts of problems.

  You can safely say Y even if your machine doesn't have MTRRs, you'll
  just add about 9K to your kernel.

  See Documentation/mtrr.txt for more information.

James Manning

Re: RAID5 array not coming up after "repaired" disk

2000-03-24 Thread James Manning


[Marc Haber]
> |autorun ...
> |considering sde7 ...
> |adding sde7 ...
> |adding sdd7 ...
> |adding sdc7 ...
> |adding sdb7 ...
> |adding sda7 ...
> |created md0

Ok, maybe I'm on crack and need to lay off the pipe a little while, but
it appears that sdf7 doesn't have a partition type of "fd" and as such
isn't getting considered for inclusion in md0.  

sde7 failure + lack of available sdf7 == 2 "failed" disks == dead raid5

James, waiting for the inevitable smack of being wrong

Re: raidtools-0.90 ioctl

2000-03-23 Thread James Manning


[Michael T. Babcock]
> And where can I find err # 22 ... or is it not defined yet?

defined in  as EINVAL

James

Re: Software RAID with kernel 2.2.14

2000-03-23 Thread James Manning


[flag]
> And if I get the same msg when I try to build a raid 0? (my kernel
> is RAID patched: 2.2.14)
> 
> [flag@Luxor flag]$ cat /proc/mdstat 
> Personalities : [1 linear] [2 raid0] [3 raid1] [4 raid5]
> read_ahead not set
> md0 : inactive
> md1 : inactive
> md2 : inactive
> md3 : inactive

allan's right, this is an unpatched kernel.

James

Re: newbie needs help

2000-03-22 Thread James Manning


[Wolfram Lassnig]
> I´m using a SuSE 6.3,  Linux version 2.2.13 ([EMAIL PROTECTED])
> 
> is it the wrong kernel patch (SuSE does not respond on my queries)

SuSE doesn't patch their kernels

Excellent software raid howto:
http://linuxdoc.org/HOWTO/Software-RAID-HOWTO.html

kernel 2.2.14 patch:
http://people.redhat.com/mingo/raid-patches/raid-2.2.14-B1

James

[PATCHES] Re: mkraid secret flag

2000-03-19 Thread James Manning


Patches attached:

#1: allan noah's suggestion (small warning, 5 seconds, that's it)
#2: untested "it compiles" patch for warning file (with Seth's 2 week
recommendation on time-span)

[ Saturday, March 18, 2000 ] m. allan noah wrote:
> think about it! rm by default does not -i!

true, although most systems (just going by RH's volume) have alias rm="rm
-i" for root (as well as a couple of other possibly-destructive commands)

> i feel that mingo/gadi et al have done a fine job, and these utils need to
> take the same approach as other system level programs- no convoluted messages
> asking for non-disclosure, just the normal warning, and the five second pause.
> raid 0.90 is almost grown up. it should act that way.

raid 0.90 maturity is orthogonal to the issue of whether we want to warn
people on a potentially destructive command.  The motivation "It really
sucks to LOSE DATA!" applys equally well to Bug-Free (tm) kernel code
as to stuff in development (ie, you're willing to destroy what's on disk).

In any case, since the patches are small and easy to get almost any
warning behavior desired (or none at all), it'll boil down to distro
preference anyway.

James


--- raidtools-0.90/mkraid.c.origSun Mar 19 03:31:48 2000
+++ raidtools-0.90/mkraid.c Sun Mar 19 03:33:46 2000
@@ -68,7 +68,6 @@
 int version = 0, help = 0, debug = 0;
 char * configFile = RAID_CONFIG;
 int force_flag = 0;
-int old_force_flag = 0;
 int upgrade_flag = 0;
 int no_resync_flag = 0;
 int all_flag = 0;
@@ -79,8 +78,7 @@
 enum mkraidFunc func;
 struct poptOption optionsTable[] = {
{ "configfile", 'c', POPT_ARG_STRING, &configFile, 0 },
-   { "force", 'f', 0, &old_force_flag, 0 },
-   { "really-force", 'R', 0, &force_flag, 0 },
+   { "force", 'f', 0, &force_flag, 0 },
{ "upgrade", 'u', 0, &upgrade_flag, 0 },
{ "dangerous-no-resync", 'r', 0, &no_resync_flag, 0 },
{ "help", 'h', 0, &help, 0 },
@@ -116,12 +114,8 @@
}
 } else if (!strcmp (namestart, "raid0run")) {
 func = raid0run;
-   if (old_force_flag) {
-   fprintf (stderr, "--force not possible for raid0run!\n");
-   return (EXIT_FAILURE);
-   }
if (force_flag) {
-   fprintf (stderr, "--really-force not possible for raid0run!\n");
+   fprintf (stderr, "--force not possible for raid0run!\n");
return (EXIT_FAILURE);
}
if (upgrade_flag) {
@@ -167,23 +161,6 @@
 
 if (getMdVersion(&ver)) {
fprintf(stderr, "cannot determine md version: %s\n", strerror(errno));
-   return EXIT_FAILURE;
-}
-
-if (old_force_flag && (func == mkraid)) {
-   fprintf(stderr, 
-
-"--force and the new RAID 0.90 hot-add/hot-remove functionality should be\n"
-" used with extreme care! If /etc/raidtab is not in sync with the real array\n"
-" configuration, then a --force will DESTROY ALL YOUR DATA. It's especially\n"
-" dangerous to use -f if the array is in degraded mode. \n\n"
-" PLEASE dont mention the --really-force flag in any email, documentation or\n"
-" HOWTO, just suggest the --force flag instead. Thus everybody will read\n"
-" this warning at least once :) It really sucks to LOSE DATA. If you are\n"
-" confident that everything will go ok then you can use the --really-force\n"
-" flag. Also, if you are unsure what this is all about, dont hesitate to\n"
-" ask questions on [EMAIL PROTECTED]\n");
-
return EXIT_FAILURE;
 }
 


--- raidtools-0.90/mkraid.c.origSun Mar 19 03:31:48 2000
+++ raidtools-0.90/mkraid.c Sun Mar 19 03:55:19 2000
@@ -68,7 +68,6 @@
 int version = 0, help = 0, debug = 0;
 char * configFile = RAID_CONFIG;
 int force_flag = 0;
-int old_force_flag = 0;
 int upgrade_flag = 0;
 int no_resync_flag = 0;
 int all_flag = 0;
@@ -79,8 +78,7 @@
 enum mkraidFunc func;
 struct poptOption optionsTable[] = {
{ "configfile", 'c', POPT_ARG_STRING, &configFile, 0 },
-   { "force", 'f', 0, &old_force_flag, 0 },
-   { "really-force", 'R', 0, &force_flag, 0 },
+   { "force", 'f', 0, &force_flag, 0 },
{ "upgrade", 'u', 0, &upgrade_flag, 0 },
{ "dangerous-no-resync", 'r', 0, &no_resync_flag, 0 },
{ "help", 'h', 0, &help, 0 },
@@ -116,12 +114,8 @@
}
 } else if (!strcmp (namestart, "raid0run")) {
 func = raid0run;
-   if (old_force_flag) {
-   fprintf (stderr, "--force not possible for raid0run!\n");
-   return (EXIT_FAILURE);
-   }
if (force_flag) {
-   fprintf (stderr, "--really-force not possible for raid0run!\n");
+   fprintf (stderr, "--force not possible for raid0run!\n");
return (EXIT_FAILURE);
}
if (upgrade_flag) {
@@ -170,8 +164,17 @@
return EXIT_FAILURE;
 }
 
-if (old_force_flag && (func == mkraid)) {
-   fprintf(stderr, 
+if (force_flag

Re: Patch Application Problem

2000-03-18 Thread James Manning

[ Saturday, March 18, 2000 ] Brian Lavender wrote:
> I am trying to apply the raid patch to the 2.2.14 kernel
> and I get this error. What is wrong?

1) Great reason to use --dry-run with patch so you can spot 
   possible problems before writing to you source tree.

> everest:/usr/src/linux# patch -p1 < raid-2.2.14-B1.patch
> patching file `init/main.c'
> Hunk #2 FAILED at 488.
> Hunk #3 succeeded at 940 with fuzz 2 (offset 12 lines).
> Hunk #4 FAILED at 1438.
> 2 out of 4 hunks FAILED -- saving rejects to init/main.c.rej
> patching file `include/linux/raid/linear.h'
> patching file `include/linux/raid/hsm_p.h'
> patching file `include/linux/raid/md.h'
> patch:  malformed patch at line 411: rint_devices(); }

2) in ever other case it's been corrupted downloads (lynx print,
   netscape save as, whatever), so I'd probably recommend something
   along the lines of wget, snarf, greed, etc.

James

mkraid secret flag

2000-03-18 Thread James Manning


[ Wednesday, March 15, 2000 ] root wrote:
> > mkraid --**-force /dev/md0

/me attempts to get the Stupid Idea Of The Month award

Motivation: trying to keep the Sekret Flag a secret is a failed effort
(the number of linux-raid archives, esp. those that are searchable, make
this a given), and a different approach could help things tremendously.

*** Idea #1:

How about --force / -f look for $HOME/.md_force_warning_read and

if not exists:
 - print huge warning (and beep thousands of times as desired)
 - creat()/close() the file

if exists:
 - Do the Horrifically Dangerous stuff

Benefit:  everyone has to read at least once (or at a minimum create a
  file that says they've read it)
Downside: adds a $HOME/ entry, relies on getenv("HOME"), etc.

*** Idea #2:

--force / -f prints a warning, prompts for input (no fancy term
tricks), and continues only on "yes" being entered (read(1,..) so
we can "echo yes |mkraid --force" in cases we want it automated).

Benefit:  warning always generated
Downside: slightly more complicated to script

Both are fairly trivial patches, so I'll be glad to generate the
patch for whichever (if either :) people seem to like.

James

Re: IBM ServeRAID Benchmark

2000-03-15 Thread James Manning

[ Tuesday, March 14, 2000 ] Christian Robottom Reis wrote:
> Just FYI, a run on a Netfinity 5000 with a ServeRAID card and two IBM 8G
> LVD disks plugged into a backplane. I can dig up the model if it makes
> things more meaningful. mem=16M, runlevel 1, numruns 5.. you know the
> drill. AFAICS to me the ServeRAID is LVD as well, which should give us
> 80Mb/s max theoretical throughput.

which backplane? first-rev timpani enclosure (just an ibm repackage
after buying a company) had problems that put a limit around 11-12 MB/sec
in what you could get (which made my tracing efforts take a *long*
time at 16GB per trace :)

later rev and piano should have that fixed.

Also make sure you use the *latest* possible firmware on ServeRAID
cards.  I finally got a good benchmarking and firmware analysis
system cooked up for them, but they've only been using it for the past
few months, so later versions have gotten much better. (overview at
http://sublogic.com/autotrace/ with visual explanation in the slide at
http://sublogic.com/autotrace/slides/sld002.htm)

Might wanna try later ips drivers if possible... it's still a fairly
new driver, and should be improving still.

> Size is MB, BlkSz is Bytes, Read and Write are MB/sec, Seeks are Seeks/sec
> 
>  Dir   Size   BlkSz  Thr#  Read (CPU%)   Write (CPU%)   Seeks (CPU%)
> - -- ---  - -- --
> /usr/  51240961   11.3997 4.96% 6.99304 4.27%  149.242 0.77%
> /usr/  51240962   11.8671 5.47% 6.95879 4.25%  195.759 0.97%
> /usr/  51240964   12.2617 5.69% 6.94252 4.27%  223.820 1.14%
> /usr/  51240968   12.3979 5.78% 6.93575 4.29%  250.433 1.41%
> /usr/  512409616  12.3850 5.82% 6.93202 4.32%  277.247 1.44%
> /usr/  512409632  12.1949 5.82% 6.92113 4.34%  297.975 1.50%
> /usr/  512409664  11.7323 5.81% 6.87402 4.37%  314.251 1.59%

I hate the Write field... it's such a lie :)  it's not "multi-threaded"
it's "single-threaded with (thread#-1) pauses"... ugh, that's going
to get changed.

James

Re: Old RAID HOWTO query?

2000-03-14 Thread James Manning


[ Monday, March 13, 2000 ] Gregory Leblanc wrote:
> What version of the RAIDtools and kernel drivers does the old
> Software-RAID-HOWTO apply to?  I need to make sure I've got it right.

The coded checks were < 0.90, but the latest to ever show up was
0.50beta3 (kernel.org/pub/linux/daemons/raid/)

James

in search of good gnuplot output

2000-03-12 Thread James Manning


As tiotest's funnyscripts/ directory is largely (if not wholly)
out-dated and broken, I've tried a first-pass perl script replacement for
makeimages.sh that takes the same params as the tiobench.pl perl script
and makes a gnuplot output.  Currently only plots the read performance
(will be fairly easy to extend later)... it's currently intentionally
fairly simple until output format(s) are stable.

This is mainly to solicit input on what valuable gnuplot output could look
like.  I'm not against surface plots, but trying to figure out good x,
y, and z variable selections for them hasn't been working well for me :)

Example output from this command:

funnyscripts/makeimages.pl --threads 1 --threads 2 --threads 4 --threads 6 --threads 8 
--threads 10 --threads 12 --threads 16 --threads 20 --threads 24 --dir /tmp --dir /src

is located here:

http://sublogic.com/reads.png

James


#!/usr/bin/perl -w

#Author: James Manning <[EMAIL PROTECTED]>
#   This software may be used and distributed according to the terms of
#   the GNU General Public License, http://www.gnu.org/copyleft/gpl.html
#
#Description:
#   Perl wrapper for calling tiobench.pl and displaying results
#   graphically using gnuplot

use strict;

my $args = join(" ",@ARGV);
my %input_fields; my %output_fields; my %values_present;
my %data; my $dir; my $size; my $blk; my $thr; my $read; my $read_cpu;
my $field;   my $write; my $write_cpu; my $seek; my $seek_cpu;
open(TIO,"tiobench.pl $args 2> /dev/null |") or die "failed on tiobench";

while( !~ m/^---/) {} # get rid of header stuff

while(my $line = ) {
   $line =~ s/^\s+//g; # remove any leading whitespace
   ($input_fields{'dir'},$input_fields{'size'}, 
$input_fields{'blk'},$input_fields{'thr'},

$output_fields{'read'},  $output_fields{'read_cpu'},
$output_fields{'write'}, $output_fields{'write_cpu'},
$output_fields{'seek'},  $output_fields{'seek_cpu'}
  ) = split(/[\s%]+/, $line);
   foreach $field (keys %input_fields) { # mark values that appear
  $values_present{$field}{$input_fields{$field}}=1;
   }
   foreach $field (keys %output_fields) { # mark values that appear
  $data{$input_fields{'dir'}}{$input_fields{'thr'}}{$field}
 =$output_fields{$field};
   }
}

my $gnuplot_input = "\n".
   "set terminal png medium color;\n".
   "set output \"reads.png\";\n".
   "set title \"Reads\";\n".
   "set xlabel \"Threads\";\n".
   "set ylabel \"MB/s\";\n".
   "plot ";

my @gnuplot_files;

foreach my $dir (sort keys %{$values_present{'dir'}}) {
   my $file="read_dir=$dir";
   $file =~ s#/#_#g;
   push(@gnuplot_files,"\"$file\" with lines");
   open(FILE,"> $file") or die $file;
   foreach my $thr (sort {$a <=> $b} keys %{$values_present{'thr'}}) {
  print FILE "$thr $data{$dir}{$thr}{'read'}\n";
  print "DEBUG: $thr $data{$dir}{$thr}{'read'}\n";
   }
   close(FILE);
}

$gnuplot_input .= join(", ",@gnuplot_files) . ";\n";

print "DEBUG: feeding gnuplot $gnuplot_input";

open(GNUPLOT,"|gnuplot") or die "could not run gnuplot";
print GNUPLOT $gnuplot_input;
close(GNUPLOT);

Re: tiotest on SMP systems...

2000-03-11 Thread James Manning


[ Saturday, March 11, 2000 ] Gregory Leblanc wrote:
> I've got a dual proc SS20 that I'm using at my toy here.  I'm running
> tiobench/tiotest on this machine to test out the raw performance of these
> disks, but I was sort of wondering what that (CPU%) number means on an SMP
> machine.  Does it represent XX% of the total CPU cycles available are being
> used, or does it represent that XX% of the 1 CPU's cycles are being used?
> Seems to me that the threading would allow it to easily split onto multiple
> CPUs, but then what does the (CPU%) represent on the single threaded test?

the CPU % is in terms of a single CPU.
the below is on my home dual celery

[root@ns1 tiotest-0.25]# ./tiobench.pl --size 16
Size is MB, BlkSz is Bytes, Read and Write are MB/sec, Seeks are Seeks/sec

 Dir   Size   BlkSz  Thr#  Read (CPU%)   Write (CPU%)   Seeks (CPU%)
- -- ---  - -- --
  . 1640961   242.571 90.9% 6.00456 7.88%  53944.7 97.9%
  . 1640962   269.951 143.% 5.97565 8.21%  61718.8 138.%
  . 1640964   279.769 157.% 5.94349 8.04%  64585.5 156.%
  . 1640968   284.229 164.% 5.81558 7.81%  66145.7 165.%

James

Re: raid0145-19990824-2.2.11.gz

2000-03-09 Thread James Manning


[ Thursday, March  9, 2000 ] Arthur Erhardt wrote:
> I just tried to patch a Linux 2.2.14 kernel

For 2.2.14 apply http://www.redhat.com/~mingo/raid-patches/raid-2.2.14-B1

Re: patch fails

2000-03-09 Thread James Manning


[ Thursday, March  9, 2000 ] Frank Joerdens wrote:
> After trying to apply raid0145-19990824-2.2.11 to a 2.2.13 kernel
> 
> /usr/src/linux/arch/i386/defconfig.rej
> /usr/src/linux/arch/sparc64/kernel/ioctl32.c.rej
> /usr/src/linux/drivers/block/ll_rw_blk.c.rej
> /usr/src/linux/include/asm-ppc/md.h.rej

Safe to ignore, as is the one or two you get applying to 2.2.12

> I also tried patching a 2.0.36, a 2.2.14 and a 2.2.12 kernel, all with
> similar results.

Don't bother with 2.0.36
For 2.2.14 apply http://www.redhat.com/~mingo/raid-patches/raid-2.2.14-B1

James

Re: how to test the performance ?

2000-03-09 Thread James Manning

[ Thursday, March  9, 2000 ] octave klaba wrote:
> I see in some emails the tables with the tests:
> cpu charge, Mb/sec etc

the nicely formatted tables come out of the perl script tiobench.pl
in the tiotest package mirrored at http://sublogic.com/tio/
(at least until Mika gets his moving finished :)

There's also bonnie at http://www.textuality.com/bonnie/ although
for raid or drive testing, I'm not sure what bonnie buys you
over tiobench's single-threaded test run... hmmm

James

Re: question on raid

2000-03-09 Thread James Manning

[ Thursday, March  9, 2000 ] Benny HO wrote:
> I am trying to setup a linear mode to expand my drive.
> 
> I did exactly what is said in the How-to doc.

Which one?  The LDP one is (checking as I write this) is outdated.
http://ostenfeld.dk/~jakob/Software-RAID.HOWTO/

> Then I run " mkraid /dev/md0"
> It returns
> Destorying the contents of the /dev/md0 in 5 seconds..
> Handling MD device /dev/md0
> analyzing super-block
> disk 0: /dev/hda6 .
> disk 1: /dev/hdb1 .
> 
> /dev/md0 Invalid argument

could you dump out anything that showed up in /var/log/messages (at
the end of it) or relevant things at the end of "dmesg" output?
Could you also include the contents of your /proc/mdstat?
Could you also include the contents of your /etc/raidtab?

> I am running RedHat Linux 6.0 with kernel 2.2.5-15

I actually don't remember whether that kernel was patched or unpatched,
and I've never done linear mode so I'm not sure there's a huge difference
(although using old mdtools vs. new raidtools is one obvious one)

James

Re: Benchmarking.. how can I get more out of my box?

2000-03-08 Thread James Manning

[ Tuesday, March  7, 2000 ] Matthew Clark wrote:
> Hey guys.. I just installed and ran iozone.. neat tool..
> 
> When the file size reaches 32Mb, I see a huge drop from around 129Mb/sec
> (obviously caching effects) right down to 10Mb/sec... then at 64Mb it drops
> to between 2.5 and 6.7 Mb/sec depending on record/block size...

Could you try bonnie (textuality.com/bonnie) or tiotest (mirror available
at sublogic.com/tio that includes the mmap code as 0.25)?  The second
opinions they offer would be interesting to see.

> I have a Dual Intel PIII 500 system with 256Mb of main Memory... It has a
> Hardware RAID 5 system on 5 18 Gb Seagate Barracuda drives spread over 3 LVD
> SCSI channels on a Megaraid controller. I have the latest megaraid source
> (1.05) from ami.com.

what parameters did you use making the h/w array? (write-through vs
write-back, stripe size, etc)

James

Re: SW-Raid1 over network block devices

2000-03-06 Thread James Manning


[ Monday, March  6, 2000 ] Holger Kiehl wrote:
>  node2: 2 x PII-350 128MB with 5 disks used as one single
> SW-Raid5, kernel 2.2.14 + mingos patch

could you try 2 things?
1) UP kernel
2) kernel 2.3.30 (SMP and then UP if still locks)

> Is it a problem that /dev/nd1 lies on another SW-Raid? ie. Part of a raid1
> on top of a raid5.

nbd's been historically flaky, with local-loopback, UP kernel situations
being the only really tested scenario :)

James

Re: RaidTools won't compile correctly

2000-03-05 Thread James Manning


[ Sunday, March  5, 2000 ] Slip wrote:
> And suggestions greatly appreciated!

You may want to read the new Software-RAID howto at
http://ostenfeld.dk/~jakob/Software-RAID.HOWTO/
Specifically section 1.2 "Requirements"

Note that the current, supported raid uses raidtools-0.90 (it's in the
"alpha" subdirectory from that place you got everything else). Note that
it will require a kernel patch, but trust us, you'll thank us later :)

James

Re: autorun

2000-03-04 Thread James Manning


[ Saturday, March  4, 2000 ] Steve wrote:
> request_module[md-personality-3]: Root fs not mounted

it would appear that you'd need to build-in the raid level support instead
of making it a module.  Main problem being that since root's not mounted
(chicken-and-egg in this case), you have nowhere to load the correct
module from.  Hence you'll need to rebuild the kernel and build-in the
necessary support instead of having it as a module.

James

Re: Problem with 2.2.x and RAID0

2000-03-04 Thread James Manning

[ Saturday, March  4, 2000 ] Martin Schulze wrote:
> I wonder why I can't get RAID0 aka striping work with 2.2.13.  It only
> runs with 2.0.36.

old-style raid is no longer supported.  You may wish to read the
s/w raid howto at http://ostenfeld.dk/~jakob/Software-RAID.HOWTO/
specifically, the "requirements section" (1.2)

quick summary: patch kernel, get new raidtools, create raidtab, mkraid

> # mdadd -ar
> /dev/sdc2: No such device
> /dev/sdd2: No such device
> /dev/sde2: No such device
> /dev/md0: No such device
> 
> The appropriate SCSI driver is included, /dev/sda1 can be mounted without
> a problem.  As you can see, the MD driver is also included, thus it should
> work.

it still would appear that you have no valid sd[cde]2. perhaps fdisk -l
/dev/sd[cde] output so we can see the partitions on those drives?

also helpful would be your raidtab (mdtab in this case) contents and
/proc/mdstat output

My guess would be either /dev/sd[cde] aren't valid drives (for whatever
reason) or they only have a single partition.  Shot in the dark, of
course, as there's not enough information to make a good assessment.

Good luck!

James

Re: Suggestion for mkraid

2000-03-03 Thread James Manning

[ Friday, March  3, 2000 ] James Manning wrote:
> [ Friday, March  3, 2000 ] Sander Flobbe wrote:
> > In my kernel I did only include the module for raid-1. Then, when I try
> > to create a raid-5 system it doesn't work:
> > 
> > Okay, okay, my fault... but a tiny little cute hint about my mistake
> > from mkraid would be nice, wouldn't it? :*)
> 
> also nice would be your raidtab contents, /proc/mdstat output, syslog
> messages, kernel version, patch used, raidtools used, etc, etc, etc

Wow, I *really* needed some sleep *sigh*  I can't even blame the
crack since I quit last week. :)  really, I did.. I swear! really!
Ok, there was that one time behind the garage! shut up already!

Yes, better and more descriptive error messages is always a good thing.
after the merge is successfully done, that'd be a good priority for
making sure s/w raid is as friendly as possible for 2.4-based distros.

James

Re: kernel 2.3.4X raid0 performance problems

2000-03-03 Thread James Manning

[ Friday, March  3, 2000 ] Karl Czajkowski wrote:
> > how much memory in the machine?
> 
> 256 MB
> dual 550 MHz pentium III
> 
> I did read other larger-than-memory files in between tests to try and
> avoid caching effects.

barely larger than memory doesn't count.

It's easily argued that 2x memory isn't even good enough either :)

3x is really about the time it gets safe.  Sadly, this will remain the
case until we can avoid caching altogether, something I'm hoping (hey,
someone tell me if this is a pipe dream :) mmap/madvise can do for us.

James

Re: kernel 2.3.4X raid0 performance problems

2000-03-03 Thread James Manning


[ Friday, March  3, 2000 ] Karl Czajkowski wrote:
> I upgraded the kernel to 2.3.47, 48, and 49 and got a performance
> problem where "time cat file ... > /dev/null" for a 300 MB file shows
> some scaling, but for a 600 MB file the throughput is almost identical to
> a single disk.

how much memory in the machine?

> is there a known scheduling problem with the 2.3.4X kernel raid vs. the
> 2.2.12-20 patches distributed by redhat?  I need the new kernel for 
> ethernet patches...

2.3.4x raid merge isn't finished yet, but I'm surprised raid0 not 
working as well as it sounds like it should.

> I also noticed that the "boot with raid" option in the kernel won't compile
> properly in the 2.3.4X series.

should once merge is finished.

James

Re: 16/02 Raid1 Benchmark

2000-03-03 Thread James Manning

[ Friday, March  3, 2000 ] Ricky Beam wrote:
> As I understand it, the "stride" will only make a real difference for
> fsck by ordering data so it's (more) evenly spread over the array.  This
> sounds correct and even "looks" correct when observing the array -- but
> I've never bothered to look at the file system handling of striding.

I'd always imagined it allowed the ext2 layer to aggregate data blocks
(the number to aggregate being the stride param) before passing the
blocks to the md layer, making things more efficient since the md layer
wouldn't have to do the same aggregation and could simply pass down
a single block.  Nothing based on looking through code, just an impression.
It'd be good to know, actually :)

Seems like you'd ideally like ext2 to pass down the data in full-stripe
sizes, but that could be asking a bit much.

James

Re: Suggestion for mkraid

2000-03-03 Thread James Manning


[ Friday, March  3, 2000 ] Sander Flobbe wrote:
> In my kernel I did only include the module for raid-1. Then, when I try
> to create a raid-5 system it doesn't work:
> 
> Okay, okay, my fault... but a tiny little cute hint about my mistake
> from mkraid would be nice, wouldn't it? :*)

also nice would be your raidtab contents, /proc/mdstat output, syslog
messages, kernel version, patch used, raidtools used, etc, etc, etc

I don't get my mind-reading certification until next semester. :)

James

Re: autorun

2000-03-03 Thread James Manning


[ Friday, March  3, 2000 ] Steve Terrell wrote:
> I have been using raid1 0.090-5 (kernel 2.2.14 w/ raid patch) on a
> couple of RedHat 6.1 boxes for several weeks with good results.
> Naturally, when I installed it on a production system, I ran into
> problems. Raid1 arrays work fine - after the machine (Redhat 6.0 kernel
> 2.2.14 w/patch) is up and running. However, autorun does not work even
> though autodetect was compiled and the partitions are type fd.
> 
> Anyone got a clue?

Did you enable autodetection?

paste the "autorun" section of the bootup log.

James

Re: What program do I use for benchmarking?

2000-03-03 Thread James Manning

[ Friday, March  3, 2000 ] bug1 wrote:
> there are a few benchmark progs arround
> 
> bonnie:old benchmark program
> bonnie++  :updated bonnie to reflect modern hardware
> tiotest :looks promising, still being developed
> iozone  :havent tried this, but www.iozone.org shows it can do pretty
> graphics, and also has a long feature list.

If anyone happens to know a command-line capability or version of iozone,
please let me know...  tons of NT benchmarking I could automate the
hell out of once I find it :)

> tiotest has been getting a lot of attention around here lately, so maybe
> you should give it a go.

Yes, please! :)  tiotest sprung up specifically for s/w raid testing
(though it's not specific to that, at least not yet :) and the more
pounding we can do, the better.

Feedback about USE_MMAP in tiotest and whether it causes any significant
changes is also very desired.

I'm hoping Linus will finally accept Chuck Lever's mincore() (and later,
madvise()) patch, solely so we can possibly get to the point where we
can efficiently benchmark without caching effects.  This is currently,
IMHO, the weakness in all methods of Linux i/o benchmarking...

James

Re: are there archives or FAQ's?

2000-03-02 Thread James Manning

[ Thursday, March  2, 2000 ] Derek Shaw wrote:
> I've re-compiled the kernel to have md support at RAID-1 included in
> ftp.fi.kernel.org/pub/linux/daemons/raid/alpha/

fetch the patch (raid0145) for kernel 2.2.11 and apply it to your
kernel source (since you said 2.2.13) and ignore rejects

If you decide on 2.2.14, use:
http://people.redhat.com/mingo/raid-patches/raid-2.2.14-B1

Since you referenced Jakob's howto, I'll note that this is covered in
section 1.2 "requirements" which you appear to have at least partially
read based on the ftp location you used. :)

James

Re: FW: ExtremeRAID 1100 benchmarks

2000-03-02 Thread James Manning


[ Thursday, March  2, 2000 ] Kenneth Cornetet wrote:
> I wished someone would port Bonnie (or tiotest) to NT.

ActivePerl + cygwin should work fine... if not, plz report specific issues
(some ifdef's on the thread stuff should be ablout it) I still have the
NTiogen re-write I did, and that'll be easy enough to rip code out of.

James

Re: ExtremeRAID 1100 benchmarks

2000-03-02 Thread James Manning


[ Thursday, March  2, 2000 ] Chris Mauritz wrote:
> Has anyone done any benchmarks with the Mylex ExtremeRAID 1100?  I'm
> planning on getting one of the 3 channel ones with 64mb cache.  Initially,
> it will be delivered on a dual PIII-750mhz machine with NT, but I'd like to
> repurpose this as a Linux file server.  It will have an external enclosure
> with 8 18gig 10,000rpm IBM Deskstars and one hot spare.  Can anyone hazard a
> guess at the kind of performance I can expect from such an array?

This is a very similar setup to the 9-disk 10krpm raid5 extremeraid 1100
benchmarks I mailed the list awhile back... search back through some
archives.

James

tiotest patch to add mmap() and madvise() capabilities

2000-03-01 Thread James Manning


By default not used (ppl just have to edit the DEFINES in their Makefile)
but worth getting into the tree now for later tinkering (specifically,
madvise() behavior checking and diff memory copy methods).

If anyone happens to have or be running a kernel with chuck lever's
madvise() patch, please try with and w/o -DUSE_MADVISE.  Otherwise,
seeing some good read/write vs. mmap() results should be interesting
(although I get the feeling I could have done the memory copy's a little
better... hmmm)

James


diff -ru tiotest-0.24/ChangeLog tiotest-0.24.mmap/ChangeLog
--- tiotest-0.24/ChangeLog  Wed Feb 16 10:25:16 2000
+++ tiotest-0.24.mmap/ChangeLog Thu Mar  2 02:34:14 2000
@@ -88,3 +88,8 @@
 
* 0.24   - prompt to STDERR and not printing ^H s any more
- minor tiobench.pl cleanup by James
+
+2000-03-02   James Manning <[EMAIL PROTECTED]>
+
+   * 0.25   - add optional use of mmap()-based IO ifdef'd on USE_MMAP
+   - add optional use of madvise() to control kernel paging USE_MADVISE
diff -ru tiotest-0.24/Makefile tiotest-0.24.mmap/Makefile
--- tiotest-0.24/Makefile   Fri Feb 11 18:25:33 2000
+++ tiotest-0.24.mmap/Makefile  Thu Mar  2 02:39:56 2000
@@ -3,6 +3,7 @@
 CC=gcc
 #CFLAGS=-O3 -fomit-frame-pointer -Wall
 CFLAGS=-O2 -Wall
+#DEFINES=-DUSE_MMAP -DUSE_MADVISE
 DEFINES=
 LINK=gcc
 EXE=tiotest
diff -ru tiotest-0.24/tiotest.c tiotest-0.24.mmap/tiotest.c
--- tiotest-0.24/tiotest.c  Wed Feb 16 10:25:30 2000
+++ tiotest-0.24.mmap/tiotest.c Thu Mar  2 02:39:45 2000
@@ -19,7 +19,7 @@
 
 #include "tiotest.h"
 
-static const char* versionStr = "tiotest v0.24 (C) Mika Kuoppala <[EMAIL PROTECTED]>";
+static const char* versionStr = "tiotest v0.25 (C) Mika Kuoppala <[EMAIL PROTECTED]>";
 
 /* 
This is global for easier usage. If you put changing data
@@ -513,23 +513,46 @@
 off_t  blocks=(d->fileSizeInMBytes*MBYTE)/d->blockSize;
 off_t  i;
 
+#ifdef USE_MMAP
+off_t  bytesize=blocks*d->blockSize; /* truncates down to BS multiple */
+void *file_loc;
+#endif
+
 fd = open(d->fileName, O_RDWR | O_CREAT | O_TRUNC, 0600 );
 if(fd == -1)
perror("Error opening file");
 
+#ifdef USE_MMAP
+ftruncate(fd,bytesize); /* pre-allocate space */
+file_loc=mmap(NULL,bytesize,PROT_READ|PROT_WRITE,MAP_SHARED,fd,0);
+if(file_loc == MAP_FAILED)
+   perror("Error mmap()ing file");
+#ifdef USE_MADVISE
+/* madvise(file_loc,bytesize,MADV_DONTNEED); */
+madvise(file_loc,bytesize,MADV_RANDOM);
+#endif
+#endif
+
 timer_start( &(d->writeTimings) );
 
 for(i = 0; i < blocks; i++)
 {
+#ifdef USE_MMAP
+memcpy(file_loc + i * d->blockSize,buf,d->blockSize);
+#else
if( write( fd, buf, d->blockSize ) != d->blockSize )
{
perror("Error writing to file");
break;
}
-   
+#endif
d->blocksWrite++;
 } 
 
+#ifdef USE_MMAP
+munmap(file_loc,bytesize);
+#endif
+
 fsync(fd);
 
 close(fd);
@@ -547,26 +570,44 @@
 intfd;
 off_t  blocks=(d->fileSizeInMBytes*MBYTE)/d->blockSize;
 off_t  i;
+#ifdef USE_MMAP
+off_t  bytesize=blocks*d->blockSize; /* truncates down to BS multiple */
+void *file_loc;
+#endif
 
 fd = open(d->fileName, O_RDONLY);
 if(fd == -1)
perror("Error opening file");
 
+#ifdef USE_MMAP
+file_loc=mmap(NULL,bytesize,PROT_READ,MAP_SHARED,fd,0);
+#ifdef USE_MADVISE
+/* madvise(file_loc,bytesize,MADV_DONTNEED); */
+madvise(file_loc,bytesize,MADV_RANDOM);
+#endif
+#endif
+
 timer_start( &(d->readTimings) );
 
 for(i = 0; i < blocks; i++)
 {
+#ifdef USE_MMAP
+memcpy(buf,file_loc + i * d->blockSize,d->blockSize);
+#else
if( read( fd, buf, d->blockSize ) != d->blockSize )
{
perror("Error read from file");
break;
}
-   
+#endif
d->blocksRead++;
 } 
 
 timer_stop( &(d->readTimings) );
 
+#ifdef MMAP
+munmap(file_loc,bytesize);
+#endif
 close(fd);
 
 return 0;
diff -ru tiotest-0.24/tiotest.h tiotest-0.24.mmap/tiotest.h
--- tiotest-0.24/tiotest.h  Fri Feb  4 14:40:27 2000
+++ tiotest-0.24.mmap/tiotest.h Wed Mar  1 14:19:14 2000
@@ -14,6 +14,10 @@
 #include 
 #endif
 
+#ifdef USE_MMAP
+#include 
+#endif
+
 #define KBYTE  1024
 #define MBYTE  (1024*KBYTE)

Re: your mail

2000-03-01 Thread James Manning


[ Wednesday, March  1, 2000 ] Christian Robottom Reis wrote:
> James, when run tiotest with a size too small for the number of threads

don't do that.

(what'd be the purpose?)
I'll add a bondary check later... but really, I'm not going to get into the
habit of checking all possible inputs against parasitic cases.

don't do that.

James

Re: Testing script

2000-03-01 Thread James Manning


[ Wednesday, March  1, 2000 ] Christian Robottom Reis wrote:
[snip]
> # time we need to sleep before resync finishes - empirical?
> snooze=5m
[snip]
>   sleep $snooze   # so the raid1 can sync in peace

FWIW, If it's the only thing resync'ing you should be able to do:

   while grep resync /proc/mdstat > /dev/null; do sleep 10; done

you can chain two grep's together or do an egrep pattern if you want
to isolate on the particular device (don't have mdstat output during
a resync handy at the moment)

James

Re: Benchmark 1 [Mylex DAC960PG / 2.2.12-20 / P3]

2000-03-01 Thread James Manning


[ Wednesday, March  1, 2000 ] Christian Robottom Reis wrote:
> On Wed, 1 Mar 2000, James Manning wrote:
> > per-char doesn't matter (one of the reasons I hate ppl using bonnie,
> > besides the single-threaded-ness). Considering the queueing and scat/gat
> 
> Why not? Because usual disk operations are done block by block?

that and because per-char stresses the OS and stdio implementation far
more than drive itself there was a little rant about it awhile back
on lkml or here

James

Re: tiotest, --numruns

2000-03-01 Thread James Manning


[ Wednesday, March  1, 2000 ] Christian Robottom Reis wrote:
> I've seen a lot of variation on various runs of tiotest using the same
> setup - even in single-user mode. Is this expected, and do you know why it
> happens? Is it just the effect of the buffer cache, or do we avoid using
> it?

we don't avoid using it currently.  Since I can find neither a 2.2
or 2.3 that has working i386 madvise(), it could be awhile :)

> What's a decent --numruns to use, taking into evidence such
> variation? I've noticed if I use more than one I get worse numbers in
> general - this is ok?

if you don't trust numruns > 1, don't use it :)

It may be worth watching "vmstat 1" output during a run just so you
can get an idea of the memory/caching interaction that's going on.

I'd like to believe that higher numruns further reduces the effect
of memory,,, for numruns=1,2,4 my numbers come out pretty close

 Dir   Size   BlkSz  Thr#  Read (CPU%)   Write (CPU%)   Seeks (CPU%)
- -- ---  - -- --
  .51240964   6.81976 9.47% 6.95052 10.3%  164.596 1.81%
  .51240964   6.72370 8.35% 6.88223 10.3%  165.602 1.84%
  .51240964   6.69172 7.37% 6.83409 10.5%  169.500 1.89%

James

Re: Benchmark 1 [Mylex DAC960PG / 2.2.12-20 / P3]

2000-03-01 Thread James Manning

[ Wednesday, March  1, 2000 ] Ricky Beam wrote:
> > ---Sequential Output ---Sequential Input-- --Random--
> > -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
> >  MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
> > 256  5451 78.7 10035  8.5  4000  7.3  3975 55.3 18765 11.3 262.8  3.9
> 
> That's a lot of CPU being used for a hardware RAID device.

per-char doesn't matter (one of the reasons I hate ppl using bonnie,
besides the single-threaded-ness). Considering the queueing and scat/gat
the driver is probably trying (not to mention caching, esp.  in the read
case), 7.3-11.3% seems acceptable for the block stuff.

I need to check to see if madvise() has been backported to 2.2.x, as
MADV_RANDOM may help cut down or eliminate memory caching effects... it'd
be nice to get (approx) the same numbers from 100MB and 1000MB test runs,
regardless of the amount of memory in the machines :)

James

Re: What version of the raidtools and other patches do I need?

2000-03-01 Thread James Manning


[ Wednesday, March  1, 2000 ] Brian Kress wrote:
>   Either use your current kernel with that patch or get 2.2.14
> and grab the patch at http://www.redhat.com/~mingo/raid.

Or for a working url :)

http://www.redhat.com/~mingo/raid-patches/raid-2.2.14-B1

James

Re: RaidZone software raid

2000-03-01 Thread James Manning


[ Tuesday, February 29, 2000 ] Hector Herrera wrote:
> Has anyone on this list used any of Raidzone's products?
> 
> http://www.raidzone.com/

Already brought up fairly recently... check some archives such as
mail-archive.com or similar

James

Re: persistent superblock in HOWTO / raidtools

2000-02-29 Thread James Manning


[ Tuesday, February 29, 2000 ] Brian Lavender wrote:
> mammoth:/# mkraid /dev/md0
> unrecognized option peristent-superblock

Try using a spell checker :)

James

Re: Benchmark 1 [Mylex DAC960PG / 2.2.12-20 / P3]

2000-02-29 Thread James Manning


[ Tuesday, February 29, 2000 ] Christian Robottom Reis wrote:
> /proc/rd/ relevant information:
> 
> * DAC960 RAID Driver Version 2.2.4 of 23 August 1999 *
> Copyright 1998-1999 by Leonard N. Zubkoff <[EMAIL PROTECTED]>
> Configuring Mylex DAC960PG PCI RAID Controller
>   Firmware Version: 4.06-0-08, Channels: 1, Memory Size: 4MB

Try updating your firmware, it may help

Configuring Mylex DAC1164P PCI RAID Controller
  Firmware Version: 5.07-0-79, Channels: 2, Memory Size: 64MB

James

Re: tiotest 0.21/0.24

2000-02-29 Thread James Manning

[ Tuesday, February 29, 2000 ] Christian Robottom Reis wrote:
> James, I've run a whole truckload of benchmarks on raid1 with varying
> chunksizes on three different kernels, and on a plain disk. I'm about to
> publish some of the stuff, but I'm wondering very hard why is it that the
> readbalancing test showed _awful_ numbers on tiotest 0.21 and great
> numbers on 0.24 - any idea? Just have a look:

I'm not going to actually wade through these numbers... just far too
many to really deal with :)

tiotest.c has only changed cosmetically

tiobench.pl's only non-cosmetic change was in stat calculation to allow
for multiple runs in an efficient and harmonic (literally) way.  Unfortuantely
for this case, it becomes the same calculation as before (first number
divided by second in tiotest output) for num_runs == 1.

Tell ya what, pick out an isolated case which is heavily reproducible,
print out the tiobench output, then print out the tiotest output.

James

Re: Chunk and Stripe for RAID1

2000-02-29 Thread James Manning

[ Tuesday, February 29, 2000 ] Christian Robottom Reis wrote:
> I've got the simple scripts I used to do the benchmarks here and if
> somebody wants to have a look, feel free.

go ahead and mail them to the list as attachments.  Might make for
more scripts to shove into tiotest/funnyscripts/

James

Re: strange syslog messages about overlapping physical units

2000-02-29 Thread James Manning


[ Tuesday, February 29, 2000 ] Christian Robottom Reis wrote:
> On Mon, 14 Feb 2000, Peter Pregler wrote:
> > All is fine but during reconstruction I get a few syslog-messages that I
> > simply cannot believe are true. The message in question are:
> > 
> > Feb 12 11:31:52 kludge kernel: md: serializing resync, md8 has overlapping
> > physical units with md9!
> 
> Just means both md partitions have component partitions on the same drive
> - isn't this in the faq, Jakob? They have to be serialized because the
> bandwidth for sync is rather limited and it'd be thrashing to let the
> resync go by in parallel.

Or restated "you wouldn't want to bother with all the wasted seeks between
the two sections of disk, so you serialize the resyncs"

James

Re: Adaptec RAID

2000-02-29 Thread James Manning


[ Tuesday, February 29, 2000 ] Andrew G Milne wrote:
> I have an Adaptec 4-channel raid controller.  I have just got the
> drivers from Dell for this card and it turns out that they have been
> statically compiled for a specific version of the kernel.  I need to use
> the raid array as a boot device (which the driver allows) but I don't
> have a boot diskette (or CD!) that has this version of the kernel.  I
> have tried using the version that I have got, but the driver doesn't
> load.

 - which kernel(s) do you have
 - which kernel does it require
 - distribution?

I'd guess it's an RH kernel avail off of redhat.com, but matching kernel
ver, CONFIG_SMP, CONFIG_MODVERSIONS might be enough to get a module that
should at least work if insmod -f'd

James

Re: Cookbook way to set up raid1

2000-02-28 Thread James Manning


[ Monday, February 28, 2000 ] Brian Lavender wrote:
> The software-RAID Howto is very _unclear_.

Did you read this one?  The LDP one is ancient (long story)

http://ostenfeld.dk/~jakob/Software-RAID.HOWTO/

Re: set block_size question after a power failure

2000-02-27 Thread James Manning


[ Sunday, February 27, 2000 ] [EMAIL PROTECTED] wrote:
> e2fsk -f -b 32768 /dev/md0 to repair using the superblocks

wouldn't this be pointing the fsck at a non-superblock?  Perhaps not,
but in my exp. superblocks are typically on 2**n+1

Did you have some indication your primary superblock was corrupted?

James

1 2 3 >

1 - 100 of 279 matches

Mail list logo