from:"D. Lance Robinson"

Re: md0 won't let go...

2000-05-11 Thread D. Lance Robinson


Harry,

Can you do simple things with /dev/hdl like... ?

   dd count=10  if=/dev/hdl of=/dev/null

It might help to see your device entry and other information, can you give us
the output of...

   ls -l  /dev/hdl
   cat /etc/mtab
   cat /proc/mdstat
   cat /etc/mdtab
   dd count=10 if=/dev/hdl of=/dev/null

This gives us an overall picture of what we are up against.

Thanks.
<>< Lance.

Re: raid1 question

2000-05-05 Thread D. Lance Robinson

Ben Ross wrote:

> Hi All,
>
> I'm using a raid1 setup with the raidtools 0.90 and mingo's raid patch
> against the 2.2.15 kernel.

...

> My concern is if /dev/sdb1 really crashes and I replace it with another
> fresh disk, partition it the same as before, and do a resync, everything
> on /dev/sdc1 (raid-disk 1) will be deleted.

There is a big difference between a resync from either a mkraid or dirty
restart vs. a resync to a spare disk. When resyncing to a spare, the device is
in degraded mode and the driver knows what disks have valid data on them and
only reads from them. The spare is only written to and is only read from once
the resync completes. In the case of an mkraid or dirty restart, the driver
picks a disk to read and sticks with it for consistency sake until the resync
is complete.

<>< Lance.

Re: Please help - when is a bad disk a bad disk?

2000-04-11 Thread D. Lance Robinson

Darren Nickerson wrote:

>   +> 4. is there some way to mark this disk bad right now, so that
>   +> reconstruction is carried out from the disks I trust? I do have a hot
>   +> spare . . .
>
>   Lance> You can use the 'raidhotremove' utility.
>
> This has never worked for me when the disk had not been marked as faulty by
> the RAID subsystem. Just says the disk is bizzy. That's why I was looking to
> set it faulty.

In that case, I would:
1) Do a normal shutdown of the machine
2) Disconnect the bad drive
3) Power up the system

If the array starts, it will be automatically removed from the array. If the array
doesn't start, maybe change the the raidtab file to match the new disk assignments.
If that fails, put the disk back in and reboot. Disconnect the power from the bad
disk while it is idle. Then access the file system.

<>< Lance.

Re: Please help - when is a bad disk a bad disk?

2000-04-11 Thread D. Lance Robinson


I hope this helps. See below.

<>< Lance.


> my questions are:
>
> 2. the disk seems to be "cured" by re-enabling DMA . . . but what is the state
> of my array likely to be after the errors above? Can I safely assume this was
> harmless? I mean, they WERE write errors after all, yes? Is my array still in
> sync? Is there any way to tell other than by unmounting the array and fscking?

> 3. is the failure simply not sufficiently severe to trigger removal from the
> array and hot reconstruction onto the host spare which is available?
>

The md driver calls the device's block driver for the specific device. It is there (or 
lower) that all media error detection
and retries are performed. If the request made from the md driver fails (by the buffer 
not being marked uptodate), then the
md driver assumes the device is bad and stops communicating with it (no retries 
attempted.) There is an exception: the md
driver will do some retries while doing a resync, but no retries are attempted under 
normal working conditions.

So, if the lower level device drivers for the IDE devices are working correctly by 
doing sometimes needed retries and
delivers the data as requested, the md driver never knows about any hiccups along the 
way. This is good and bad. Good in
that the md driver doesn't need to worry about different types of devices and their 
peculiar behavior, but bad in that the
md driver cannot predict device failures due to flaky or deteriorating hardware.

So, if the md driver doesn't fail a drive that is because the lower levels have taken 
care of all the nitty details and have
supposedly performed the requested data transfer correctly. As long as the actual 
device drivers do the requests, the md
driver won't know about any problems.



>
> 4. is there some way to mark this disk bad right now, so that reconstruction
> is carried out from the disks I trust? I do have a hot spare . . .
>

You can use the 'raidhotremove' utility.

Re: Raid1 - dangerous resync after power-failure?

2000-03-30 Thread D. Lance Robinson

The event counter (and serial number) only indicates that the superblock is the most 
current.
The SB_CLEAN bit is cleared when an array gets started, and is set when it is stopped 
(this
automatically happens during a normal shutdown.) But, if the system crashes or the 
power gets
yanked, the SB_CLEAN bit will be zero, so the next reboot will trigger a resync to 
guarantee the
array is in sync. As far as the md driver knows, you could have been doing heavy i/o 
when the
system went down--leaving the array out of sync. There is possibly a way to set 
SB_CLEAN during
long idle periods, but then it would have to be cleared before doing any more i/o (i/o 
which
might get interrupted.)

<>< Lance.

Sam Horrocks wrote:

> I agree, if the two disks are truly out of sync
> then the only thing you can do is copy the most recent
> data to the out of date disk.
>
> But what I'm seeing is that the two disks are in
> sync (at least according to the serial numbers in the
> superblock), but due to the SB_CLEAN flag not having
> been set to true, the code decides to do a resync anyways,
> regardless of the fact that both discs are apparently
> in-sync.
>
> And this resync is dangerous - it copies over good data.

Re: Raid1 - dangerous resync after power-failure?

2000-03-30 Thread D. Lance Robinson

It is a very bad idea to prevent resyncs after a volume has possibly becoming out of 
sync.
It is important to have the disks in sync--even if the data is the wrong data. The way
raid-1's balancing works, you don't know what disk will be read. For the same block, 
the
system may read different disks at a different times. This type of inconsistency is 
worse
than starting with bad data. Fsck can correct most inconsistencies in the data, but if
different data is possibly hidden, fsck cannot do anything about it and the data will
eventually come out.

Also, with raid-5, if the array is not in sync and has a disk failure (thus goes into
degraded mode,) the data generated using bad parity will be bad data.

<>< Lance.

Sam wrote:

> OK, regardless of how the failure occurs,  my point is that a
> resync is a potentially dangerous operation if you don't
> know beforehand whether the source disk has bad sectors or not.
> So I don't think a resync should be performed except when
> absolutely necessary, or unless the source disk is known to
> be absolutely free from errors.
>
> Can someone answer my original question which was:
>
>  Could the SB_CLEAN flag be eliminated to reduce the
>  risk of a resync damaging good data?

Re: reconstruction problem.

2000-03-16 Thread D. Lance Robinson


>
> i have set up an md (raid1) device. it has two hard disks.
>
> Something has gone bad on the disks, such
> that whenever I do a raidstart or mkraid, it
> says
>   raid set md0 not clean. starting background reconstr.. ..
>
> what can I do to clean my md device.

If the raid device isn't stopped correctly it will be dirty and requires
a resync the next startup. A reconstruction is also done when the array
is initially created. This reconstruction step is actually unnecessary
for RAID-1, but is vital for RAID-5; but the raid driver does it anyway.
Also, you must allow a resync to completely finish before stopping the
array; otherwise, the array will start all over again the next time it
is started.


> mkraid --really-force also is not helpful .
> i have tried destroying both hard disk partitions using fdisk
> and doing a clean raid setup, still it starts the background
> reconstruction.
>
> also what is the command to do a low level format of the harddisk
> in linux?

There is no built in command to format a disk. You can easily zero out
the data by doing a 'dd /dev/sda'. If you are using scsi,
then there is a scsiformat utility with the scsiinfo package, but this
may need fiddling to compile.

<>< Lance.

Re: SV: SV: raid5: bug: stripe->bh_new[4]

2000-03-03 Thread D. Lance Robinson


Johan,

Thanks for sending the bulk information about this bug. I have never seen the buffer 
bug
when running local loads, only when using nfs. The bug appears more often when running
with 64MB of RAM or less, but has been seen when using more memory.

Below is a sample of the errors seen while doing tests. Very interesting is that the 
same
buffer had a problem within 5 minutes with all having different buffers.

These all look like potential data corruption since multiple buffers are assigned to 
the
same physical block. I have seen corruption, but the corruption seems to be because of
the nfs client, not the server side.

Hopefully, this problem will get resolved soon, but it looks like it has been with us 
for
some time now (2 years.)

<>< Lance.


Mar  1 22:33:10 src@lance-v raid5: bug: stripe->bh_new[2], sector 26272 exists
Mar  1 22:33:10 src@lance-v raid5: bh c100b680, bh_new c0594bc0
Mar  1 22:37:32 src@lance-v raid5: bug: stripe->bh_new[2], sector 26272 exists
Mar  1 22:37:32 src@lance-v raid5: bh c2d1be60, bh_new c1edcea0
Mar  1 22:42:41 src@lance-v raid5: bug: stripe->bh_new[3], sector 360880 exists
Mar  1 22:42:41 src@lance-v raid5: bh c1777840, bh_new c180
Mar  2 03:26:37 src@lance-v raid5: bug: stripe->bh_new[2], sector 1792 exists
Mar  2 03:26:37 src@lance-v raid5: bh c0549240, bh_new c0ed30c0
Mar  2 09:07:38 src@lance-v raid5: bug: stripe->bh_new[0], sector 293016 exists
Mar  2 09:07:38 src@lance-v raid5: bh c20150c0, bh_new c2015600
Mar  2 14:10:08 src@lance-v raid5: bug: stripe->bh_new[2], sector 42904 exists
Mar  2 14:10:08 src@lance-v raid5: bh c084c5c0, bh_new c262b8a0

Re: still get max 12 disks limit

2000-02-29 Thread D. Lance Robinson


Perhaps if you also modified MAX_REAL in the md_k.h file to 15, it will like more
than 12. This value is only used by raid0.

<>< Lance.

[EMAIL PROTECTED] wrote:

> i tried this but mkraid still gives the same error of "a maximum of 12
> disks is supported."
> i set MD_SB_DISKS_WORDS to 480 to give me 15 disks.
> does anyone have more detailed instructions?
>
> looking in parser.c i see where that should have worked, but i guess i'm
> missing something.  it parses down to /dev/hdn which is my 13th disk and
> then gives the error, so somehow MD_SB_DISKS is still 12 not 15.  weird.
> i did a "make install" for raidtools and verified that it updated the
> binaries.
>
> On Tue, 29 Feb 2000, TAKAMURA Seishi wrote:
>
> > Dear Eldon,
> >
> > You seem to use RAID0, and I use RAID5, so just FYI.  I changed both
> > kernel code and raidtool code to increase disk limit.  Quick and dirty
> > way (which I did) is modify MD_SB_DISKS_WORDS appropriately in the
> > following two header files.
> >   raidtools-0.90/md-int.h
> >   linux/include/linux/raid/md_p.h
> > (MD_SB_DISKS_WORDS/32 = maximum drive number)
> >
> > With this modification, I am now using an array with 24 disks(1.0TB).
> >
> > > On  Mon, 28 Feb 2000 16:52:16 -0600 (CST)
> > > [EMAIL PROTECTED] said:
> > >
> > >
> > > I've been told (by Jakob) that the limit of 12 disks per md device is just
> > > a typo in the code.  I'm trying to make a 14-drive linear array.  I tried
> > > changing MAX_REAL from 12 to 14 in md_k.h (and then even recompiled
> > > raidtools) but mkraid still complains about the limit being 12.  is there
> > > any way around this (safely)?
> > >
> > > ps i also tried nesting 2 md's inside onelocked up the machine, so i
> > > don't feel good about that approach.
> > >
> > > i'm using 2.2.15
> > >
> > > Eldon
> >
> > Seishi Takamura, Dr.Eng.
> > NTT Cyber Space Laboratories
> > Y922A 1-1 Hikarino-Oka, Yokosuka, Kanagawa, 239-0847 Japan
> > Tel: +81-468-59-2371, Fax: +81-468-59-2829
> > E-mail: [EMAIL PROTECTED]
> >

Re: [FAQ-answer] Re: soft RAID5 + journalled FS + power failure =problems ?

2000-01-14 Thread D. Lance Robinson

Ingo,

I can fairly regularly generate corruption (data or ext2 filesystem) on a busy
RAID-5 by adding a spare drive to a degraded array and letting it build the
parity. Could the problem be from the bad (illegal) buffer interactions you
mentioned, or are there other areas that need fixing as well? I have been
looking into this issue for a long time with no resolve. Since you may be aware
of possible problem areas: any ideas, code or encouragement is greatly welcome.

<>< Lance.

Ingo Molnar wrote:

> On Wed, 12 Jan 2000, Gadi Oxman wrote:
>
> > As far as I know, we took care not to poke into the buffer cache to
> > find clean buffers -- in raid5.c, the only code which does a find_buffer()
> > is:
>
> yep, this is still the case. (Sorry Stephen, my bad.) We will have these
> problems once we try to eliminate the current copying overhead.
> Nevertheless there are bad (illegal) interactions between the RAID code
> and the buffer cache, i'm cleaning up this for 2.3 right now. Especially
> the reconstruction code is a rathole. Unfortunately blocking
> reconstruction if b_count == 0 is not acceptable because several
> filesystems (such as ext2fs) keep metadata caches around (eg. the block
> group descriptors in the ext2fs case) which have b_count == 1 for a longer
> time.

Re: large ide raid system

2000-01-11 Thread D. Lance Robinson

SCSI works quite well with many devices connected to the same cable. The PCI bus
turns out to be the bottleneck with the faster scsi modes, so it doesn't matter
how many channels you have. If performance was the issue, but the original poster
wasn't interested in performance, multiple channels would improve performance if
the slower (single ended) devices are used.

<>< Lance

Dan Hollis wrote:

> Cable length is not so much a pain as the number of cables. Of course with
> scsi you want multiple channels anyway for performance, so the situation
> is very similar to ide. A cable mess.

Re: Swapping Drives on RAID?

2000-01-11 Thread D. Lance Robinson

Scott,

1.  Use raidhotremove to take out the IDE drive.  Example:
raidhotremove /dev/md0 /dev/hda5
2.  Use raidhotadd to add the SCSI drive.  Example: raidhotadd /dev/md0
/dev/sda5
3.  Correct your /etc/raidtab file with the changed device.

<>< Lance.

Scott Patten wrote:

> I'm sorry if this is covered somewhere.  I couldn't find it.
>
> 1 - I have a raid1 consisting of 2 drives.  For strange
> historical reasons one is SCSI and the other IDE.  Although
> the IDE is fairly fast the SCSI is much faster and since I
> now have another SCSI drive to add, I would like to replace
> the IDE with the SCSI.  Can I unplug the IDE drive, run in
> degraded mode, edit the raid.conf and somehow mkraid
> without loosing data or do I need to restore from tape.
> BYW, I'm using 2.2.13ac1.
>

Re: new raid5 says overlapping physical units....

2000-01-07 Thread D. Lance Robinson

Roland,

The messages are not to be feared. To prevent thrashing on a drive between multiple 
resync processes, the
raid resync routine checks to see if any of the disks in the array are already active 
in another resync.
If so, then it waits for the other process to finish before starting. Thus, the resync 
processes are
serialized when a disk is shared between raid arrays.

<>< Lance.

Roland Roberts wrote:

> I "recently" installed stock RedHat 6.1 and configured with root RAID1
> and everything else RAID5.  I have 4 U2 LVD SCSI drives on two
> controllers.  RedHat plays games with the partition layouts when I try
> to use its graphical tool, so I ended up partitioning the disks with
> fdisk.
>
> After I first installed RedHat and allowed it to build the RAID
> devices from my individual partitions, I got the following disturbing
> messages in syslog:
>
> Dec 17 18:41:30 kernel: md: serializing resync, md5 has overlapping physical units 
>with md6!
> Dec 17 18:41:30 kernel: md: serializing resync, md4 has overlapping physical units 
>with md6!
> Dec 17 18:41:31 kernel: md: serializing resync, md3 has overlapping physical units 
>with md6!

Re: Adding a spare-disk (continued)

1999-12-25 Thread D. Lance Robinson

Hi,

By the mdstat shown below, you have a 3 drive raid-5 device with one spare. The
[0], [1] and [2] indicate the raid role for the associated disks. Values of [3]
or higher are the spare (for a three disk array.) In general, in an 'n' disk
raid array, [0]..[n-1] are the disks that are in the array with data, and [n]...
are the spares, as shown from /proc/mdstat.

You are in good shape for the hda2 disk to kick in as the spare if on of the
other disks fails.

<>< Lance.

Johan Ekenberg wrote:

> I recently inquired about adding a spare-disk to an operating RAID-5 array,
> and was given the advice to use raidhotadd. I've tried this and want to make
> sure that the result is the one I should expect. I thought that spare disks
> would show up as an "unused device" in /proc/mdstat, but that may not be the
> case???
>
> This is my mdstat:
> Personalities : [linear] [raid0] [raid1] [raid5]
> read_ahead 1024 sectors
> md0 : active raid5 hda2[3] sdc2[2] sdb2[1] sda2[0] 8305408 blocks level 5,
> 32k chunk, algorithm 2 [3/3] [UUU]
> unused devices: 
>
> The spare disk in this case is hda2[3], defined as a spare in /etc/raidtab.
> Is this the way it should look? Can I be confident that hda2 will kick in if
> one of the sd* fails? hda2 is of course formated exactly like the other
> partitions.

Re: Help:Raid-5 with 12 HDD now on degrade mode.

1999-12-20 Thread D. Lance Robinson

Makoto,

The normal raid driver only handles 12 disk entries (or slots). Unfortunately, a
spare disk counts as another disk slot, and you need a spare slot to rebuild the
failed disk. But, with your setup of 12 disk raid 5, you have already defined all
the available disk slots.

To recover your 12 disk raid 5 system, you will need to modify your kernel and
raid tools to accommodate more disks. Fortunately, the reason there is currently a
12 disk limit is from an erroneous calculation, and there is room for many more
disks (I don't remember the actual limit, but it is over 24). There has been some
talk of this subject in the past. If you look in the list archive for the thread
"the 12 disk limit" there is some information on what needs to be done to modify
the kernel.

This brings up a question though; Can an existing 12 disk limited raid superblock
work with a kernel that supports more than 12 disks? I'd think so, since the
unused areas are zeroed out. I don't know of anybody trying it though.

The tools should, but don't, limit the number of devices in a raid 5 array to one
less than the maximum disk slots in the raid superblock so than the last slot can
be used as a spare. Unfortunately, you ran into this trap.

Good luck, <>< Lance.

Makoto Kurokawa wrote:

> Hello, All.
>
> I have a trouble of HDD fail of raid-5,raid-0.90 on Redhat 6.0.
>
> Raid-5 is now working on degrade mode.
> Exactly, Iacan't repair or replace the failed HDD (to new HDD).
> Woule you tell me how to do recovery it?
>
> "/proc/mdstat" is as follows:
>
> [root@oem /root]# cat /proc/mdstat
> Personalities : [raid5]
> read_ahead 1024 sectors
> md0 : active raid5 sdm1[11] sdl1[10] sdk1[9] sdj1[8] sdi1[7] sdh1[6] sdg1[5]
> sdf1[4] sde1[3] sdd1[2] sdc1[1] 97192128 blocks level 5, 4k chunk, algorithm 2
> [12/11] [_UUU]
> unused devices: 
>
> "sdb1[0]" is failed, I think.
>
> "/etc/raidtab" is as follows:
>
> # Sample raid-5 configuration
> raiddev /dev/md0
> raid-level  5
> nr-raid-disks   12
> chunk-size  4
>
> # Parity placement algorithm
>
> #parity-algorithm   left-asymmetric
>
> #
> # the best one for maximum performance:
> #
> parity-algorithmleft-symmetric
>
> #parity-algorithm   right-asymmetric
> #parity-algorithm   right-symmetric
>
> # Spare disks for hot reconstruction
> #nr-spare-disks  0
>
> device  /dev/sdb1
> raid-disk  0
>
> device  /dev/sdc1
> raid-disk  1
>
> device  /dev/sdd1
> raid-disk  2
>
> device  /dev/sde1
> raid-disk  3
>
> device  /dev/sdf1
> raid-disk  4
>
> device  /dev/sdg1
> raid-disk  5
>
> device  /dev/sdh1
> raid-disk  6
>
> device  /dev/sdi1
> raid-disk  7
>
> device  /dev/sdj1
> raid-disk  8
>
> device  /dev/sdk1
> raid-disk  9
>
> device  /dev/sdl1
> raid-disk  10
>
> device  /dev/sdm1
> raid-disk  11
>
> First, I restarted  the PC and tryed "raidhotadd" and "raidhotremove" ,the
> result is as fllows:
>
> [root@oem /root]# raidhotadd /dev/md0 /dev/sdb1
> /dev/md0: can not hot-add disk: disk busy!
>
> [root@oem /root]# raidhotremove /dev/md0 /dev/sdb1
> /dev/md0: can not hot-remove disk: disk not in array!
>
> Next, I replaced HDD,/dev/sdb to new HDD, the result, system hung-up on boot
> time.
>
> With the message, "/dev/md0 is invalid."
>
> what should I do to recovery the Raid-5 from degrade-mode to normal mode?
>
> Makoto Kurokawa
> Engineer, OEM Sales Engineering
> Storage Products Marketing, Fujisawa, IBM-Japan
> Tel:+81-466-45-1441 FAX:+81-466-45-1045
> E-mail:[EMAIL PROTECTED]

Re: kernel SW-RAID implementation questions

1999-11-12 Thread D. Lance Robinson

There is a constant specifying the maximum number of md devices. But,
there is no variable stating how many active md devices are around. This
wouldn't make much sense anyway since the md devices are not allocated
sequentially. You can start with md3, for example.

You can have a program analyze the /proc/mdstat file to see what md
device numbers are currently active and thus not available for new
devices.

<>< Lance.

Thomas Waldmann wrote:
> 
> Is there a variable containing the md device count (md0, md1, ..., mdn. n == ?)
> ?

Re: Tuning readahead

1999-11-12 Thread D. Lance Robinson


Attached is a program that will let you get or set the read ahead value
for any major device. You can easily change the value and then do a
performance test.

<>< Lance.


Jakob Østergaard wrote:
> 
> Hi all !
> 
> I was looking into tuning the readahead done on disks in a RAID.
> It seems as though (from md.c) that levels 0, 4 and 5 are handled
> in similar ways.
> 
> The readahead is set to chunk_size*4 per disk, and then increased
> to 1024*MAX_SECTORS = 1024*128 = 128k if the above equation yielded
> a result lower than this.
> 
> So besides from changing the chunk size to something bigger, is there
> any way the readahead can be tuned ?   Should (and could I safely) just
> change the equation in md.c ?
 readahead.c

Re: Uping the limit of drives in a single raid.

1999-09-30 Thread D. Lance Robinson

Jakob Østergaard wrote:

> IIRC the 12 disk limit is a ``feature''. Actually you can have up to 15 disks. Simply
> grep for the 12 disk constant in the raidtools and flip it up to 15.  You can't go
> further than that though.

I forget the exact number, but Ingo said that the drive count can be
changed to (23-28?) -- somewhere in there, and that change may precede
the 250 disk limit.

<>< Lance.

Re: Slower read access on RAID-1 than regular partition

1999-09-16 Thread D. Lance Robinson

Optimizing the md driver for Bonnie, IMHO, is foolishness. Bonnie is a
sequential read/write test and does not produce numbers that mean much
in typical data access patterns. Example: the read_ahead value is bumped
way up (1024), this kills performance when doing more normal accesses.
Linux's average contiguous data area request size is much smaller than
512kb. Yes, this makes Bonnie look better, but not a real working
system.

It is nice to have high Bonnie results, but not at the expense of a
working system. I wish I knew of a more statistically oriented data
access test like Netbench, but on the server side. The reason Bonnie is
so popular is that it is easy (and cheap.)

In the Raid1 case. A Bonnie test will not highlight the advantages of
read balancing. Someone can do tune the chunk size to work best with
Bonnie. But the best chunk size for a Raid1 test on Bonnie will most
likely be a bad choice for a normal operating system.

Please don't think that Bonnie result always mean much. They are fun to
compare, but be careful in how the numbers are interpreted.

<>< Lance.

[EMAIL PROTECTED] wrote:
> 
> On Wed, 15 Sep 1999, James Manning wrote:
> 
> > > -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
> > > Machine  MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
> > > md0 192  5933 86.4 15222 21.8  4172 11.8  5672 81.3  9014 11.2 218.4  4.6
> > > sd0 192  6411 92.0 15072 18.5  4265 11.7  5760 80.6 12069 13.1 201.8  4.5
> >
> > More cases with faster write access (significantly) than read... am I
> > wrong in thinking this is strange?  Is bonnie really worth trusting?
> > Is there a better tool currently available?
> 
> bonnie is the main benchmark i'm optimizing for. hdparm -tT is rather
> useless in this regard, it has only a relevance on maybe e2fsck times.
> 
> i'll have a look at RAID1 read balancing. I once ensured we read better
> than single-disk, but we might have lost this property meanwhile ...
> 
> -- mingo

Re: the 12 disk limit

1999-08-30 Thread D. Lance Robinson

Lawrence,

If you don't care about being 'standard', There is plenty of fluff in
the superblock to make room for more disks. I don't know how well
behaved all the tools are at using the symbolic constants though.  To
Support 18 devices, you will need to allow at least 19 disks (one for
the spare/replacement), but I like using even numbers, so round it up to
20.

You'd have to change.

In linux/include/linux/raid/md_k.h
#define MAX_REAL 20

Change lines in linux/include/linux/raid/md_p.h so it reads something
like...
#define MD_SB_DESCRIPTOR_WORDS   32
#define MD_SB_DISKS  20
#define MD_SB_DISK_WORDS (MD_SB_DESCRIPTOR_WORDS * MD_SB_DISKS)

These have to be above the line with MD_SB_RESERVED_WORDS.
NOTE: the md driver primarily uses MD_SB_DISKS for the max number of
disk count. The MAX_REAL value is also used (twice), but it could have
just as well used the MD_SB_DISKS value. Oh well.

And, recompile--both the kernel and the tools.

Try it out, let us know if the tools work.
<>< Lance.

Lawrence Dickson wrote:
> 
> All,
>I guess this has been asked before, but - when will the RAID
> code get past the 12 disk limit? We'd even be willing to use
> a variant - our customer wants 18 disk RAID-5 real bad.
>Larry Dickson
>Land-5 Corporation

Re: Why RAID1 half-speed?

1999-08-30 Thread D. Lance Robinson

Hi Mike,

You are using a very small chunk size. Increase this number to 128. I
think you may need to remake the array though. This is kind of silly
since in RAID-1, the data isn't laid out any differently for different
chunk sizes as other raid personalities are. It would be nice to be able
to just edit the radtab file and have it automagically change.

The significance of the chunk size is this. The RAID-1 personality has a
read balancing mechanism that tries to use the same drive so long as the
requests are sequential and not bigger than the chunk size. With a chunk
size of 4, the raid driver is breaking the read requests into 4KB
chunks, then switching to the next disk which is hardly optimal. A value
of 128 is much better. For bonnie tests, the larger the better, but
bonnie is not a real world test. I find 128 a good compromise.

<>< Lance.

Mike Black wrote:
> 
> I'm a little confused on RAID1...running 2.2.11 with
> raid0145-19990824-2.2.11.bz2 on a PII/233
> 
> I just set up a mirror this weekend on an IDE RAID1 - two 5G disks on the
> same IDE bus (primary and master).
> 
> I was under the impression that I shouldn't see any slowdown and maybe even
> a speedup but, alas, it is not so.
> 
> Here's the hparm test (ran several times -- similar results each time):
> 
> /dev/hda:
>  Timing buffer-cache reads:   64 MB in  0.95 seconds =67.37 MB/sec
>  Timing buffered disk reads:  32 MB in  3.28 seconds = 9.76 MB/sec
> 
> /dev/md0:
>  Timing buffer-cache reads:   64 MB in  0.85 seconds =75.29 MB/sec
>  Timing buffered disk reads:  32 MB in  6.10 seconds = 5.25 MB/sec
> 
> It looks like I've lost half of the bandwidth on disk reads.  Did I miss
> something??  Here's the raidtab entry:
> 
> raiddev /dev/md0
> raid-level1
> nr-raid-disks 2
> nr-spare-disks0
> persistent-superblock 1
> chunk-size4
> 
> device/dev/hda1
> raid-disk 0
> device/dev/hdb1
> raid-disk 1
> 
> 
> Michael D. Black   Principal Engineer
> [EMAIL PROTECTED]  407-676-2923,x203
> http://www.csi.cc  Computer Science Innovations
> http://www.csi.cc/~mike  My home page
> FAX 407-676-2355

Re: seeking advice for linux raid config

1999-07-21 Thread D. Lance Robinson

James,

There are currently 128 possible SCSI disk device allocated in the
device map--see linux/Documentation/devices.txt .  Now, each of these
supports partitions 1..15 (lower 4 bits) with 0 being the raw device,
and the other bits for the base device are mapped into various places.
There is a slight chance of modifying things so that you have less
partition bits and give those unused bits to the base scsi devices. I
don't know how well disciplined the scsi code is in using the conversion
macros from device and partition to device number.

You have another problem with the md driver (raid). It's superblock is
coded to allow raid sets of up to 11 devices (12 if you count the
spare.)  This is a #define set to 12. You should be able increase this
value to 16 and recompile the kernel and tools.

I have heard of a large file patch for ext2 filesystem that you may be
able to use. FYI: the ext2 filesystem is limited to 1-Terabyte maximum
per volume.

<>< Lance.

[EMAIL PROTECTED] wrote:
> 
> > The Software RAID solution will give you all the flexibility you need.
> > If you have already considered it, and discarded it as an option for
> > some reason, I'd be grateful to know about that reason.
> 
> The 16-scsi-drive limitation that existed (at least at one time).
> While the limit may be higher now, being over 240 (ideally 256 minimum
> seems unlikely (would require 16 device major's afaict, at least with
> the current partition/minor config).  If this limitation is gone, I
> would *love* to do pure s/w raid, that's for sure...
> 
> James
> --
> Miscellaneous Engineer --- IBM Netfinity Performance Development

Re: RAID-0 Slowness

1999-07-01 Thread D. Lance Robinson

Mark,

Having a very large chunk size would reduce the performance down close to that
of a single device. Two performance factors to keep in mind: access time, and
throughput. Access time is important for the many small files and accesses
needed, and throughput is needed for large requests. Mixed in with these factors
is request overhead latency, the average seek/access time, the sustained
throughput of a single device, and the size of the devices cache buffer.
   In setting the chunk size, I suppose there may be two schools of thought.
First, have the chunk size so one spindle can handle the entire request--freeing
the other spindles to work on some other areas (this increases access time), or
secondly, to have all the spindles working in parallel (this increase
throughput). And a third strategy, setting the chunk size to work well with both
large and small requests.
   On a typical system, most requests (and files) are small ( < 4KB), but there
are many larger requests ( > 256KB) that load in object code. I suggest a chunk
size of around 64(KB) since it allows a greater access time when using small
requests, and also adds increase throughput for larger requests (by sharing the
requests).  128KB may work just as well, but this exceeds the size of some cache
buffers and some device drivers cannot request more than 64KB in one request.

My two cents worth.
<>< Lance.

Marc Mutz wrote:

> D. Lance Robinson wrote:
> >
> > Try bumping your chunk-size up. I usually use 64. When this number is low,
> > you cause more scsi requests to be performed than needed. If really big (
> > >=256 ) RAID 0 won't help much.
> >
> What if the chunk size matches ext2fs's group size (i.e. 8M)? This would
> give very good read/write performance with moderatly large files (i.e.
> <8M) if multiple processes do access the fs, because ext2fs usually
> tries to store a file completely within one block group. The performance
> gain would be n-fold, if n was the number of disks in the raid0 array
> and the number of processes was higher than that.
> It would give only single-speed (so to speak) for any given application,
> though.
> But then: Wouldn't linear append be essentially the same, given that
> ext2fs spreads files all across the block groups from the beginning?
>
> Would that not be the perfect setup for a web server's documents volume,
> with MinServers==n? The files are usually small and there are usually
> much more than n servers running simultaneously.
>
> Is this analysis correct or does it contain flaws?
> What be the difference between raid0 with 8M chunks and linear append?
>
> Just my thoughts wandering off...
>
> Marc

Re: RAID-0 Slowness

1999-06-30 Thread D. Lance Robinson


Try bumping your chunk-size up. I usually use 64. When this number is low,
you cause more scsi requests to be performed than needed. If really big (
>=256 ) RAID 0 won't help much.

<>< Lance.

Richard Schroeder wrote:

> Help,
> I have set up RAID-0 on my Linux Redhat 6.0.  I am using RAID-0
> (striping) with two IDE disks (each disk on it's own IDE controller).
> No problems in getting it running.  However, my tests show I/O
> performance seems to be worse than on a "normal" non-RAID filesystem.  I
> have tried different chunk-sizes to no avail.  I must be missing
> something.  Shouldn't I be seeing a slight performance gain?
>
> Here is my /etc/raidtab:
>   raiddev /dev/md0
>   raid-level 0
>   nr-raid-disks 2
>   nr-spare-disks 0
>   chunk-size 4
>   persistent-superblock 1
>   device  /dev/hda8
>   raid-disk 0
>   device  /dev/hdc8
>   raid-disk 1
>
> Curious
>
> Richard Schroeder
> [EMAIL PROTECTED]

Re: What hardware do you recommend for raid?

1999-06-14 Thread D. Lance Robinson

Hi Lucio,

Lucio Godoy wrote:
> 
> The idea of using raid is to add  more disks onto the scsi
> controler (Hot adding ?) when needed and combine the newly
> added disk to the previous disks as one physical device.
> 
> Is it possible to add another disk without having to switch of the
> machine?

There are special disk enclosures that allow you to add new scsi disks
into the drive bays without turning the power off. HOWEVER, the RAID
device driver does not allow you to add a disk to enlarge the raid
device's size. Hot adding is only used for replacement of a faulty
device.

> is it possible to combine that newly added disk to the previous physical
> device?

Not to make it bigger as stated above.

If you want to enlarge a device using RAID level 0, 4 or 5, you will
need to:
* backup your data.
* verify your backup is okay.
* add the disk.
* create a new RAID device (mkraid)
* restore your backup.

<>< Lance.

Re: How to read /proc/mdstat

1999-05-30 Thread D. Lance Robinson

To identify the spare devices through /proc/mdstat...

1) Look for the  [#/#]  value on a line. The first number is the
   number of a complete raid device as defined. Lets say it is 'n'.
2) The raid role numbers [#] following each device indicate its
   role, or function, within the raid set. Any device with 'n' or
   higher are spare disks. 0,1,..,n-1 are for the working array.

Also, if you have a failure, the failed device will be marked with (F)
after the [#]. The spare that replaces this device will be the device
with the lowest role number n or higher that is not marked (F). Once the
resync operation is complete, the device's role numbers are swapped.

Don't count on the order in which the devices appear in the /proc/mdstat
output.

<>< Lance.

Osma Ahvenlampi wrote:
> 
> This is the /proc/mdstat output on a particular kernel 2.0.36 +
> raid0145-19990421 system equipped with six SCSI disks, configured as
> (multiple) 5-disk RAID-5 plus one hot spare disk. However, it's not
> immediately obvious to me from the output WHICH of the disks is the
> spare (I know that it's /dev/sdf, since that's the one I added as
> spare after creating the array with no spare disk, but what if I
> didn't know that?).
> 
> My motivation to ask this is actually so that I might be able to
> decide whether I could tell the spare disk to spin down, since it's
> not it use. No point having it spinning wearing itself down when the
> point of it is to work in case one of the others fail.
> 
> # cat /proc/mdstat
> Personalities : [raid1] [raid5] [translucent]
> read_ahead 1024 sectors
> md0 : active raid1 sdb2[1] sda2[0] 64192 blocks [2/2] [UU]
> md1 : active raid5 sdf5[5] sde5[4] sdd5[3] sdc5[2] sdb5[1] sda5[0] 706304 blocks 
>level 5, 32k chunk, algorithm 2 [5/5] [U]
> md2 : active raid5 sdf6[5] sde6[4] sdd6[3] sdc6[2] sdb6[1] sda6[0] 1959424 blocks 
>level 5, 32k chunk, algorithm 2 [5/5] [U]
> md3 : active raid5 sdf7[5] sde7[4] sdd7[3] sdc7[2] sdb7[1] sda7[0] 1959424 blocks 
>level 5, 32k chunk, algorithm 2 [5/5] [U]
> md4 : active raid5 sdf8[5] sde8[4] sdd8[3] sdc8[2] sdb8[1] sda8[0] 30587136 blocks 
>level 5, 32k chunk, algorithm 2 [5/5] [U]
> unused devices: 
> 
> --
> Osma Ahvenlampi

Re: raid1 on ide decreases read performance

1999-05-29 Thread D. Lance Robinson


Don't start to think that Bonnie gives real world performance numbers.
It gives single tasking sequential access throughput values. Sure
Bonnie's numbers have some value, but don't think that its results match
typical system access patterns.

The performance difference with Raid-1 is seen when doing several io
bound tasks simultaneously. Bonnie doesn't come close to doing this.

<>< Lance.

[EMAIL PROTECTED] wrote:

> Yes, I guess you're right that the way raid-1 stripes the reads doesn't
> necessarily yield higher read performance after all...   Here's a little
> test I did:
> 
> raid-0 on two disks:
>   ---Sequential Output ---Sequential Input-- --Random--
>   -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
> MachineMB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
>   900  6160 97.1 21710 73.3  8559 52.5  7841 94.2 23977 63.9 157.3  5.5
> 
> raid-1 on the same disks:
>   ---Sequential Output ---Sequential Input-- --Random--
>   -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
> MachineMB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
>   470  5801 94.6 11719 39.5  5264 32.5  6931 83.0 11861 34.8 167.4  4.7
> 
> Hmm   I know that raid-1 does distribute the reads to both disks, so I would
> think that read-performance should increase.  But it seems like it doesn't. At
> least not in this case.   Btw. the disks where on separate SCSI controllers.
>

Re: raid1 on ide decreases read performance

1999-05-28 Thread D. Lance Robinson


> >
> > The bottom line: Read performance for a RAID-1 device is better than a
> > single (JBOD) device. The bigger the n in n-way mirroring gives better
> > read performance, but slightly worse write performance.
> >
> But using n-way mirrors will also increase cpu utilization during reads
> -
> or am I wrong? - because of the cycling process.

CPU utilization is not increased for reading by higher n's in n-way
mirroring. Only one device is asked for the data. The overhead for the
balancing is small. If cpu utilization goes up while reading, it is
because your throughput is higher :-)

Memory bus utilization is increased (thus increase in CPU utilization)
for writing more n's in n-way mirroring. This is because the data is
duplicated across the memory bus n times.  This is true with IDE and
SCSI.

<>< Lance.

Re: raid1 on ide decreases read performance

1999-05-28 Thread D. Lance Robinson

Osma,

RAID-1 does read balancing which may(?) be better than striping. Each
read request is checked against the previous request, if it is
contiguious with the previous request, it uses the same device,
otherwise it switches to the next mirror. This process cycles through
the mirrors (n-way mirrors.)

The bottom line: Read performance for a RAID-1 device is better than a
single (JBOD) device. The bigger the n in n-way mirroring gives better
read performance, but slightly worse write performance.

<>< Lance.

Osma Ahvenlampi wrote:
> 
> Dietmar Stein <[EMAIL PROTECTED]> writes:
> > Readperformance will only increase by using raid0 (stripe), but it will
> > not be twice times faster.
> 
> Does the Linux RAID-1 code still not stripe reads? I thought it did.
> 
> --
> Osma Ahvenlampi

Re: Add expansion of exisiting RAID 5 config in software RAID?

1999-05-27 Thread D. Lance Robinson


The answer is still the same (May 1999).

<>< Lance.

Scott Smyth wrote:
> 
> I would like to explore the requirements of expanding
> RAID 0,4, and 5 levels from an existing configuration.
> For example, if you have 3 disks in a RAID 5 configuration,
> you currently cannot add a disk to the RAID 5 without
> destructively remaking the RAID 5 configuration and reformatting
> the multiple block device upon completion.  Is anyone
> working on (I remember it mentioned previously on the list)
> what has been called "resize array" in the software RAID
> howto in the wish list section.
> 
> from RAID 5 FAQ:
> 
>2.Q: Can I add disks to a RAID-5 array?
> 
>  A: Currently, (September 1997) no, not without erasing all data.
> A conversion utility to allow this does not yet exist. The problem
> is that the actual structure and layout of a RAID-5 array depends
> on the number of disks in the array. Of course, one can add
> drives by backing up the array to tape, deleting all data, creating
> a new array, and restoring from tape.
> 
> thanks,
> Scott

Fix for /proc/mdstat & raidstop panic

1999-05-13 Thread D. Lance Robinson


Hi all,

Attached is a fix for a problem that happens when /proc/mdstat is read
when a raid device is being stopped. A panic could result.

Not many users are reading /proc/mdstat much or stopping a raid device
manually, but this problem caused us many headaches.

The problem happens something like...
1) raidstop is run
2) raidstop process removes superblock structure from raid device
structure before being removed from the list of raid devices.
3) /proc/mdstat starts reading raid device structures and
   tries to read the superblock data that doesn't exist.
4) panic.

Solution:
* Added a new semaphore that protects the all_mddevs list.
* Added lock and unlock code around each reference to the list.
* Needed to fix some other related semaphore use.
* Modified md_status so it will check for null sb ptr. if found,
  a message like the following is given:
 md1 : inactive sb

Note: I could very quickly run into the problem before (in about 5
seconds.) I ran a few scripts to test the routine out and it started and
stopped three independent raid arrays while another script just read
/proc/mdstat. This ran for over 50,000 total start/stop cycles. For some
reason, one of the raid devices got in the 'D' state. This seems
unrelated to the given fix since down_interruptible is being used.

Back to other things...
<>< Lance.

--- linux-r16a/drivers/block/md.c   Tue May 11 00:05:30 1999
+++ linux/drivers/block/md.cThu Apr 29 23:23:53 1999
@@ -162,6 +162,18 @@
  */
 static MD_LIST_HEAD(all_mddevs);
 
+/*
+ * The all_mddevs_sem must be taken before modifying the all_mddevs list.
+ * It should only be needed when either adding or removing an mddev.
+ * You must NOT have any mddev->reconfig_sem locked while locking
+ * this semaphore.
+ */
+static struct semaphore all_mddevs_sem = MUTEX;
+
+/*
+ * Allocates an mddev structure.
+ * Returns: the pointer to the mddev_t structure which is locked.
+ */
 static mddev_t * alloc_mddev (kdev_t dev)
 {
mddev_t * mddev;
@@ -186,9 +198,13 @@
 * personalities can create additional mddevs 
 * if necessary.
 */
+   lock_all_mddevs();
+   lock_mddev( mddev );
add_mddev_mapping(mddev, dev, 0);
md_list_add(&mddev->all_mddevs, &all_mddevs);
+   unlock_all_mddevs();
 
+   /* NOTE: this mddev is still locked! */
return mddev;
 }
 
@@ -208,9 +224,14 @@
while (md_atomic_read(&mddev->recovery_sem.count) != 1)
schedule();
 
+   unlock_mddev( mddev );
+   lock_all_mddevs();  /* lock the list */
+   lock_mddev( mddev );/* Just in case we got blocked for all. */
+
del_mddev_mapping(mddev, MKDEV(MD_MAJOR, mdidx(mddev)));
md_list_del(&mddev->all_mddevs);
MD_INIT_LIST_HEAD(&mddev->all_mddevs);
+   unlock_all_mddevs();
kfree(mddev);
 }
 
@@ -1878,6 +1899,7 @@
md_list_del(&rdev->pending);
MD_INIT_LIST_HEAD(&rdev->pending);
}
+   unlock_mddev(mddev);
autorun_array(mddev);
}
printk("... autorun DONE.\n");
@@ -2556,14 +2586,6 @@
err = -ENOMEM;
goto abort;
}
-   /*
-* alloc_mddev() should possibly self-lock.
-*/
-   err = lock_mddev(mddev);
-   if (err) {
-   printk("ioctl, reason %d, cmd %d\n",err, cmd);
-   goto abort;
-   }
err = set_array_info(mddev, (void *)arg);
goto done_unlock;
 
@@ -3189,6 +3221,7 @@
}
 
if (!mddev->pers) {
+   unlock_mddev( mddev );
sz += sprintf(page+sz, "\n");
continue;
}
@@ -3201,9 +3234,12 @@
if (md_atomic_read(&mddev->resync_sem.count) != 1)
sz += sprintf(page + sz, " resync=DELAYED");
}
+   unlock_mddev( mddev );
sz += sprintf(page + sz, "\n");
}
sz += status_unused (page + sz);
+
+   unlock_all_mddevs();
 
return (sz);
 }
--- linux-r16a/include/linux/raid/md_k.hTue May 11 00:05:30 1999
+++ linux/include/linux/raid/md_k.h Thu Apr 29 23:23:52 1999
@@ -294,6 +294,13 @@
ITERATE_RDEV_GENERIC(pending_raid_disks,pending,rdev,tmp)
 
 /*
+ * It would be better for these to be inline, but all_mddevs_sem is static.
+ * This is the locking mechanism for the all_mddevs list.
+ */
+#define lock_all_mddevs()  down_interruptible( &all_mddevs_sem )
+#define unlock_all_mddevs()up( &all_mddevs_sem )
+
+/*
  * iterates through all used mddevs in the system.
  */
 #define ITERATE_MDDEV(mddev,tmp)   \

Re: Swap on raid

1999-05-10 Thread D. Lance Robinson

Hi,

You can run a system without a swap device. But if you do 'swapoff -a'
_after_ a swap device failure, you are dead (if swap had any virtual
data stored in it.)

'swapoff -a' copies virtual data stored in the swap device to physical
memory before closing the device. This is much different than losing
access to the swap data due to a failure.

<>< Lance.

[EMAIL PROTECTED] wrote:
> 
> Hm,
> 
> I understand the necessary of redundancy; but isn't it the same
> if you do a swapoff -a or swap-disks dies on a system?
> What I have in mind is the thing, that the system should not swap
> at all, so that it is necessary to have as much memory (RAM) as
> possible.

system panic when reading /proc/mdstat while doing raidstop.

1999-05-07 Thread D. Lance Robinson


There seems to be a major problem when reading /proc/mdstat while a raid
set is being stopped. This rarely conflict will very rarely be seen, but
I have a daemon that monitors /proc/mdstat every two seconds and once in
a while the system panics when doing testing.

While running the script below which pounds on (reading) /proc/mdstat
for about a second, then backs off for a second. I got a panic after 3
start/stop cycles and it happened when doing the stop.

Also, doing this exercise, there is something else interesting. Without
the 'sleep 1', the raid driver will not resync or stop. I would think
that a window of time would eventually open up, but after about a minute
of waiting, still nothing happened.

Another thing. Reading the /proc/mdstat seems to be relatively slow. It
takes over a second to read it 100 times. Perhaps this delay is also in
the script processing.

Any comments or fixes :-)  are appreciated.

<>< Lance.


#--START OF SCRIPT---
#!/bin/bash

count=0
icount=0

while [ 1 ]; do
 cp /proc/mdstat /dev/null
 let count=count+1
 let icount=icount+1
 if [ $icount = 100 ]; then
cat /proc/mdstat
sleep 1
echo $count
icount=0
 fi
done
#--END OF SCRIPT---

Re: RAID+devfs patch for new kernel?

1999-05-01 Thread D. Lance Robinson

Hi Steve,

I made the patches that are on Richard's site for raid+devfs.
Unfortunately, I was having too many problems with devfs on my PowerPC
sustem and had to solve problems without devfs. I still have a patch
file that I used to help create the raid+devfs patch. I don't know if it
fixes all the devfs patch problems for the current versions, but if
someone else wants to try, I'd be glad to give some simple instructions.

<>< Lance.

Steve Costaras wrote:
> 
> Does anyone know, or is anyone working on a new combined patch for
> the kernel (2.2.6 or 2.2.7) for both RAID & devfs?  The last one I've seen
> is for 2.2.3..

Memory buffer corruption with Raid on PPC

1999-04-14 Thread D. Lance Robinson


I have linux 2.2.3 with raid014519990309.. patch.

On a PPC (Mac G3) system, I am getting what seems to be memory buffer
courruption when using raidstart. The same kernel source run with i386
architecture seems to be fine.

To show the problem, I do something like the following...

#  cd ~me
#  gcc source_a.c 
#  raidstart /dev/md0 ; gcc source_a.c 
#  raidstop  /dev/md0 ; gcc source_a.c 
#  raidstart /dev/md0 ; gcc source_a.c 
#  raidstop  /dev/md0 ; gcc source_a.c 
#  raidstart /dev/md0 ; gcc source_a.c 
#  raidstop  /dev/md0 ; gcc source_a.c 
#  raidstart /dev/md0 ; gcc source_a.c 
#  raidstop  /dev/md0 ; gcc source_a.c 

Usually doing this will cause the compile to have a problem. Such as:
"Illegal Instruction", or some compile error. This indicates some sort
of memory buffer corruption since the gcc is all done out of memory.

The problem seems to be in the raidstart area. It shows up after
starting the array, but once the array is started (and passes the gcc
test,) it works fine.

Any ideas ?

Thanks, <>< Lance.

Re: Day 7 and still no satisfaction

1999-04-02 Thread D. Lance Robinson


Carl,

The 2.2.4 kernel does not have the latest raid code. But, the raid
patches do not yet cleanly apply to the 2.2.4 kernel.  I suggest you
start with the 2.2.3 kernel, apply the appropriate raid patches
(raid0145-19990309-2_2_3.gz), and get the latest raidtools
(raidtools-19990309-0_90_tar.gz).  The best, but not great,
documentation comes with the raidtools.

<>< Lance.

> Carl Hilinski wrote:
> 
> I am quickly reaching the end of the rope. I wanted to learn about
> RAID in linux (having used it much in NT), so I tried to patch Redhat
> 2.0.36 with the 0145 raid patch, which simply returned "X not set"
> messages in defconfig.rej. Since I could find no info on what to do to
> solve that, I upgraded to kernel 2.2.4 (which I assume doesn't need
> the 0145 patch since it has the "personalities" and raid 1 and 5 can
> be selected in the make config). So I set up a 100+mb partition as
> hda5 and a 100+mb partition on hdb1(both of which were configured
> under the original Redhat 5.2 install), umounted them, set up the
> raidtab to say use Raid 1 with the /dev/hda5 and /dev/hdb1 partitions,
> no spares and the persistent superblock 1 value. When I do a
> mkraid --really-force /dev/md1, I get the message:
> disk 0: /dev/hdb1 166129kb, raid superblock at 166016kb
> disk 1: /dev/hda5 167296kb, raid superblock at 167232kb
> mkraid: aborted
> 
> What happened? I've spent days and days on trying to make this work (I
> had to install a WinNT server because I had a deadline and couldn't
> make this work). What did I do wrong? And how would I know? There's no
> docs on what happens when it all goes wrong.
> 
> ch

Re: Filesystem corruption (was: Re: Linux 2.2.4 & RAID - success report)

1999-03-31 Thread D. Lance Robinson


I have also experienced file system corruption with 2.2.4. The problem most
likely lies in the /fs/buffer.c file which the raid patch had a conflict
with.

<>< Lance.

Tony Wildish wrote:

>  this sound to me like bad memory. I had a very similar problem recently
> and it was a bad SIMM. I was lucky enough to have four SIMMS in the
> machine so I can still run with only two, having removed the bad SIMM and
> its partner

>
> On Mon, 29 Mar 1999, Richard Jones wrote:
>
> > Not so fast there :-)
> >
> > In the stress tests, I've encountered almost silent
> > filesystem corruption. The filesystem reports errors
> > as attached below, but the file operations continue
> > without error, corrupting files in the process. At
> > no time did the RAID software report any problem, nor
> > did any reconstruction kick in.
> >
> > Anyone have any ideas what might be going on? It doesn't
> > seem to be exclusively a 2.2.4 thing. I've seen similar
> > problems with 2.0.36-19990128.

raid5: md0: unrecoverable I/O error for block x

1999-03-12 Thread D. Lance Robinson


Hi,

If I "scsi remove-single-device" two devices from a RAID5, I would
expect the RAID device to eventually fail itself. But it seems to be in
some sort of loop spitting out

 raid5: md0: unrecoverable I/O error for block 

Where  seems to be cyclic. Top shows that raid5d is taking 99% of
the cpu.

Note: the device was busy with activity when I logically removed the
n-2nd device.

<>< Lance.

read_ahead in md driver.

1999-03-12 Thread D. Lance Robinson


Hi,

I have noticed that the read_ahead value is set to 1024 in the md
driver. Why is this value so large? I would think a value of 128 or so
would be more appropriate.

<>< Lance.

md: bug in file raid5.c, line 666 (line of raid5_error code)

1999-02-15 Thread D. Lance Robinson


Hi,


I am doing some tests with raid. I will probably have more posts on
other situations, but here is a situation that causes raid problems...

scenario:
1) mkraid /dev/md/0# raid5 three drive, no spare (using devfs)
2)   Wait for resync to complete
3)   Disable one of the drives.
4) mke2fs /dev/md/0

Environment:
PC + Linux 2.2.2pre2, + raid 19990128 patch + devfs + (out of memory
patch) + sym53c8xx scsi driver.

The mke2fs process starts queuing *many* scsi requests before the first
request fails on the crippled device.  Each of the queued scsi requests
fails and starts a scsi bus reset cycle and the raid driver spits out
its  *  * with other things and somewhere
in there is a program bug message. The scsi reset cycle is over a second
and there were perhaps 150 or more queued items. It took a while before
the system gave up on the mkfs process. After that, things worked okay
in degraded mode.

Can't the raid driver de-queue any requests it has for a device it has
marked bad?  In my case, the raid driver apparently re-issued the device
requests for the same blocks to the other good drives. This eventually
ran out of memory which terminated the mke2fs.  I don't mind this
happening to mkfs, but it may happen to something else much more
critical.

Note: since I have devfs in raid5.c, the line number 666 (I didn't make
that up,) is probably different than the standard code. The message is
coming from a MD_BUG message within the raid5_error() routine.



<>< Lance.


Log of some of the bad activity

Feb 15 14:21:02 myk6 kernel: ncr53c895-0-<1,*>: FAST-40 WIDE SCSI 80.0
MB/s (25 ns, offset 15)
Feb 15 14:21:02 myk6 kernel: ncr53c895-0-<6,*>: FAST-40 WIDE SCSI 80.0
MB/s (25 ns, offset 31)
Feb 15 14:21:04 myk6 kernel: scsi0 channel 0 : resetting for second half
of retries.
Feb 15 14:21:04 myk6 kernel: SCSI bus is being reset for host 0 channel
0.
Feb 15 14:21:01 myk6 kernel: scsidisk I/O error: dev 08:01, sector 88
Feb 15 14:21:01 myk6 kernel: md: bug in file raid5.c, line 666
Feb 15 14:21:01 myk6 kernel: 
Feb 15 14:21:01 myk6 kernel:**
Feb 15 14:21:01 myk6 kernel:*  *
Feb 15 14:21:01 myk6 kernel:**
Feb 15 14:21:01 myk6 kernel: md0:
 array superblock:
Feb 15 14:21:01 myk6 kernel:   SB: (V:0.90.0)
ID: CT:36c88867
Feb 15 14:21:01 myk6 kernel:  L5 S04440832 ND:3 RD:3 md0 LO:0
CS:32768
Feb 15 14:21:01 myk6 kernel:  UT:36c88f64 ST:0 AD:2 WD:2 FD:1 SD:0
CSUM:14c414c6 E:0008
Feb 15 14:21:01 myk6 kernel:  D  0: 
DISK
Feb 15 14:21:01 myk6 kernel:  D  1: 
DISK
Feb 15 14:21:01 myk6 kernel:  D  2: 
DISK
Feb 15 14:21:01 myk6 kernel:  D  3:  DISK
Feb 15 14:21:01 myk6 kernel:  D  4:  DISK
Feb 15 14:21:01 myk6 kernel:  D  5:  DISK
Feb 15 14:21:01 myk6 kernel:  D  6:  DISK
Feb 15 14:21:01 myk6 kernel:  D  7:  DISK
Feb 15 14:21:01 myk6 kernel:  D  8:  DISK
Feb 15 14:21:01 myk6 kernel:  D  9:  DISK
Feb 15 14:21:01 myk6 kernel:  D 10:  DISK
Feb 15 14:21:01 myk6 kernel:  D 11:  DISK
Feb 15 14:21:01 myk6 kernel:  THIS: 
DISK
Feb 15 14:21:01 myk6 kernel:  rdev sd/c0b0t0u0p1: O:sd/c0b0t0u0p1,
SZ: F:1 DN:0 no rdev sup
erblock!
Feb 15 14:21:01 myk6 kernel:  rdev sd/c0b0t6u0p1: O:sd/c0b0t6u0p1,
SZ:0032 F:0 DN:2 rdev superb
lock:
Feb 15 14:21:01 myk6 kernel:   SB: (V:0.90.0)
ID: CT:36c88867
Feb 15 14:21:01 myk6 kernel:  L5 S04440832 ND:3 RD:3 md0 LO:0
CS:32768
Feb 15 14:21:01 myk6 kernel:  UT:36c88f64 ST:0 AD:2 WD:2 FD:1 SD:0
CSUM:4b8ca45b E:0008
Feb 15 14:21:01 myk6 kernel:  D  0: 
DISK
Feb 15 14:21:01 myk6 kernel:  D  1: 
DISK
Feb 15 14:21:01 myk6 kernel:  D  2: 
DISK
Feb 15 14:21:01 myk6 kernel:  D  3:  DISK
Feb 15 14:21:01 myk6 kernel:  D  4:  DISK
Feb 15 14:21:01 myk6 kernel:  D  5:  DISK
Feb 15 14:21:01 myk6 kernel:  D  6:  DISK
Feb 15 14:21:01 myk6 kernel:  D  7:  DISK
Feb 15 14:21:01 myk6 kernel:  D  8:  DISK
Feb 15 14:21:01 myk6 kernel:  D  9:  DISK
Feb 15 14:21:01 myk6 kernel:  D 10:  DISK
Feb 15 14:21:01 myk6 kernel:  D 11:  DISK
Feb 15 14:21:01 myk6 kernel:  THIS: 
DISK
Feb 15 14:21:01 myk6 kernel:  rdev sd/c0b0t1u0p1: O:sd/c0b0t1u0p1,
SZ:0032 F:0 DN:1 rdev superb
lock:
Feb 15 14:21:01 myk6 kernel:   SB: (V:0.90.0)
ID: CT:36c88867
Feb 15 14:21:01 myk6 kernel:  L5 S04440832 ND:3 RD:3 md0 LO:0
CS:32768
Feb 15 14:21:01 myk6 kernel:  UT:36c88f64 ST:0 AD:2 WD:2 FD:1 SD:0
CSUM:4b8ca449 E:0008
Feb 15 14:21:01 myk6 kernel:  D  0: 
DISK
Feb 15 14:21:01 myk6 kernel:  D  1: 
DISK
Feb 15 14:21:01 myk6 kernel:  D  2: 
DISK
Feb 15 14:21:01 myk6 kernel:  D  3:  DISK
Feb 15 14:21:01 myk6 kernel:  D  4:  DISK
Feb 15 14:21:01 myk6 kernel:  D  5:  DISK
Feb 15 14:21:01 myk6 kernel:  D  6:  DISK
Feb 15 14:21:01 myk6 kernel:  D  7:  DISK
Feb 15 14:21:01 myk6 kernel:  D  8:  DISK
Feb 15 14:21:01 myk6 kernel:  D  9:  DISK
Feb 15 14:21:01 myk6 kernel:  D 10:  D

Re: disconnecting live disks

1999-02-05 Thread D. Lance Robinson

steve rader wrote:
> 
> Some eec person once told me that disconnecting live molex
> (power) scsi connectors can kill a disk drive.  And I'm also
> not confortable futzing with scsi connectors on live busses.
> 
> I assume the perferred method is to put the disk-to-kill
> on a external power supply with a power switch.
> 
> Is there a safe way without an external power supply?
> 

Maybe you could rig a switch in a drive power cable to kill the 12volt
line. Or you could make a power cable extention with a switch in it.
Then you could remove it once done testing.

Killing the 12volt line will effectively break the drive. It would be
interesting to see how various drives handle their error reporting.

I have a power splitter cable, I think I'll put a switch in it and see
what happens.

<>< Lance.

Re: [BUG] v2.2.0 heavy writing at raid5 array kills processes

1999-02-05 Thread D. Lance Robinson

Markus Linnala wrote:
> 
> v2.2.0 heavy writing at raid5 array kills processes randomly, including init.
> 
> Normal user can force random processes to out of memory
> situation when writing stuff at raid5 array. This makes the raid
> 
> I get 'Out of memory for init. ' etc. with following simple command:
> 
> dd if=/dev/zero of=file
> 

I am also getting "Out of memory for .." when trying to mke2fs. I
originally thought this was limited to a PowerPC, but I moved my raid
set to a PC and I get the same results.

My setup is a raid5 with three Ultra2 drives with 4GB each. I am using a
Symbios 53c895 chip with the alpha sym53c8xx driver. Using linux 2.2.1
and lates 012899 raid patches.

It seems as though the processor is outpacing the i/o and using up the
64MB of memory for buffers.

I am curious what configurations out there do work. Or maybe it would be
better to know which ones don't so it can get fixed.

<>< Lance.

Re: Physical device tracking....

1999-01-29 Thread D. Lance Robinson

James,

First of all, you probably want to reboot. This will rename your devices
to their typical values. To add a device into a failed raid slot, you
can use the raidhotadd command. do something like:

raidhotadd /dev/md0 /dev/hdc2

This will add the device to the raid set and start a resync operation.

BTW: I hope you are only trying raid out with the setup you have shown
below. Using the same device more than once in a raid set is: 1)slow,
and 2)does not protect your data.

I hope this helps some. I may be off target in what you have done and
what you want to do.

<>< Lance.

A James Lewis wrote:

> After testing various failure conditions, I seem to be stuck because the
> system allocated new disk numbers to the disks
> 
> RAID1 conf printout:
>  --- wd:1 rd:2 nd:3
>  disk 0, s:0, o:1, n:0 rd:0 us:1 dev:hdb1
>  disk 1, s:0, o:0, n:1 rd:1 us:1 dev:[dev 00:00]
>  disk 2, s:1, o:0, n:2 rd:2 us:1 dev:hdb3
>  disk 3, s:1, o:0, n:3 rd:3 us:1 dev:hdb2
>  disk 4, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
> 
> I need to get "disk 1" back to the correct device is there a way to do
> this?  Perhaps there is a FAQ, but I guess not since this is so new
>

Re: [BUG] v2.2.0 heavy writing at raid5 array kills processes

1999-01-29 Thread D. Lance Robinson

I have also noticed this type of problem.  It seems as though the RAID5
driver generates a growing write backlog and keeps allocating new
buffers when new asynchronous write requests get in. Eventually it
reserves all the available physical memory. Trying to swap data to
virtual memory storage would only make the situation worse.

I'm not sure where the responsibility lies for this problem. The md
driver can limit how much it allocates, but the memory manager should be
able to handle this situation better.

Markus Linnala wrote:
> 
> v2.2.0 heavy writing at raid5 array kills processes randomly, including init.
> 
> Normal user can force random processes to out of memory
> situation when writing stuff at raid5 array. This makes the raid
> 
> I get 'Out of memory for init. ' etc. with following simple command:
> 
> dd if=/dev/zero of=file
> 
> Repeatable, file is between 100-200M after dd gets killed.
> I guess this killing action seems to be triggered by swapping.
>

Where is 2.1.131-ac11 kernel

1998-12-18 Thread D. Lance Robinson


I've been hearing about 2.1.131-ac9, and now 2.1.131-ac11. What does the
-acX mean and where is it available?

Thanks, <>< Lance.

Re: raid0145 & devfs v79

1998-11-30 Thread D. Lance Robinson


Eric van Dijken wrote:

> Is there somebody working on joining the devfs patch and the raid patch in
> the linux kernel (2.1.130) ?
> 

I am planning on working on this issue sometime this week.

<>< Lance.

Raid5 pauses when doing mk2efs on PowerPC

1998-11-16 Thread D. Lance Robinson


Hi all,

The RAID5 md driver pauses for 10-11 seconds, many times, while doing a
mke2fs. The pauses start after 300-400 groups have been written, then a
small amount of transfers happen between pauses until the process is
done. The spurts of transfers between pauses range between .01 seconds
to maybe 3 seconds. Under 'normal' use afterwards, everything seems
fine. The problem is memory resource sensitive. If there is more RAM,
the pauses start later while doing the mkfs. The problem possibly is a
race condition with the raid daemon and the code that re-starts it.

My environment is...

PowerPC G3 266MHZ, 64MB ram.
Symbios (LSI) 53c895 PCI SCSI chip
3, 4GB LVD drives (1 Quantum Viking II, 2 Seagate Barracudas)
Kernel 2.1.127 & related raid patches.
raid5 cluster size is 32k.

Since most folks are using x86 systems and I haven't heard of this
problem on the raid list, it seems to be specific to the powerpc.

Any thoughts?

47 matches

Mail list logo