Re: md0 won't let go...
Harry, Can you do simple things with /dev/hdl like... ? dd count=10 if=/dev/hdl of=/dev/null It might help to see your device entry and other information, can you give us the output of... ls -l /dev/hdl cat /etc/mtab cat /proc/mdstat cat /etc/mdtab dd count=10 if=/dev/hdl of=/dev/null This gives us an overall picture of what we are up against. Thanks. <>< Lance.
Re: raid1 question
Ben Ross wrote: > Hi All, > > I'm using a raid1 setup with the raidtools 0.90 and mingo's raid patch > against the 2.2.15 kernel. ... > My concern is if /dev/sdb1 really crashes and I replace it with another > fresh disk, partition it the same as before, and do a resync, everything > on /dev/sdc1 (raid-disk 1) will be deleted. There is a big difference between a resync from either a mkraid or dirty restart vs. a resync to a spare disk. When resyncing to a spare, the device is in degraded mode and the driver knows what disks have valid data on them and only reads from them. The spare is only written to and is only read from once the resync completes. In the case of an mkraid or dirty restart, the driver picks a disk to read and sticks with it for consistency sake until the resync is complete. <>< Lance.
Re: Please help - when is a bad disk a bad disk?
Darren Nickerson wrote: > +> 4. is there some way to mark this disk bad right now, so that > +> reconstruction is carried out from the disks I trust? I do have a hot > +> spare . . . > > Lance> You can use the 'raidhotremove' utility. > > This has never worked for me when the disk had not been marked as faulty by > the RAID subsystem. Just says the disk is bizzy. That's why I was looking to > set it faulty. In that case, I would: 1) Do a normal shutdown of the machine 2) Disconnect the bad drive 3) Power up the system If the array starts, it will be automatically removed from the array. If the array doesn't start, maybe change the the raidtab file to match the new disk assignments. If that fails, put the disk back in and reboot. Disconnect the power from the bad disk while it is idle. Then access the file system. <>< Lance.
Re: Please help - when is a bad disk a bad disk?
I hope this helps. See below. <>< Lance. > my questions are: > > 2. the disk seems to be "cured" by re-enabling DMA . . . but what is the state > of my array likely to be after the errors above? Can I safely assume this was > harmless? I mean, they WERE write errors after all, yes? Is my array still in > sync? Is there any way to tell other than by unmounting the array and fscking? > 3. is the failure simply not sufficiently severe to trigger removal from the > array and hot reconstruction onto the host spare which is available? > The md driver calls the device's block driver for the specific device. It is there (or lower) that all media error detection and retries are performed. If the request made from the md driver fails (by the buffer not being marked uptodate), then the md driver assumes the device is bad and stops communicating with it (no retries attempted.) There is an exception: the md driver will do some retries while doing a resync, but no retries are attempted under normal working conditions. So, if the lower level device drivers for the IDE devices are working correctly by doing sometimes needed retries and delivers the data as requested, the md driver never knows about any hiccups along the way. This is good and bad. Good in that the md driver doesn't need to worry about different types of devices and their peculiar behavior, but bad in that the md driver cannot predict device failures due to flaky or deteriorating hardware. So, if the md driver doesn't fail a drive that is because the lower levels have taken care of all the nitty details and have supposedly performed the requested data transfer correctly. As long as the actual device drivers do the requests, the md driver won't know about any problems. > > 4. is there some way to mark this disk bad right now, so that reconstruction > is carried out from the disks I trust? I do have a hot spare . . . > You can use the 'raidhotremove' utility.
Re: Raid1 - dangerous resync after power-failure?
The event counter (and serial number) only indicates that the superblock is the most current. The SB_CLEAN bit is cleared when an array gets started, and is set when it is stopped (this automatically happens during a normal shutdown.) But, if the system crashes or the power gets yanked, the SB_CLEAN bit will be zero, so the next reboot will trigger a resync to guarantee the array is in sync. As far as the md driver knows, you could have been doing heavy i/o when the system went down--leaving the array out of sync. There is possibly a way to set SB_CLEAN during long idle periods, but then it would have to be cleared before doing any more i/o (i/o which might get interrupted.) <>< Lance. Sam Horrocks wrote: > I agree, if the two disks are truly out of sync > then the only thing you can do is copy the most recent > data to the out of date disk. > > But what I'm seeing is that the two disks are in > sync (at least according to the serial numbers in the > superblock), but due to the SB_CLEAN flag not having > been set to true, the code decides to do a resync anyways, > regardless of the fact that both discs are apparently > in-sync. > > And this resync is dangerous - it copies over good data.
Re: Raid1 - dangerous resync after power-failure?
It is a very bad idea to prevent resyncs after a volume has possibly becoming out of sync. It is important to have the disks in sync--even if the data is the wrong data. The way raid-1's balancing works, you don't know what disk will be read. For the same block, the system may read different disks at a different times. This type of inconsistency is worse than starting with bad data. Fsck can correct most inconsistencies in the data, but if different data is possibly hidden, fsck cannot do anything about it and the data will eventually come out. Also, with raid-5, if the array is not in sync and has a disk failure (thus goes into degraded mode,) the data generated using bad parity will be bad data. <>< Lance. Sam wrote: > OK, regardless of how the failure occurs, my point is that a > resync is a potentially dangerous operation if you don't > know beforehand whether the source disk has bad sectors or not. > So I don't think a resync should be performed except when > absolutely necessary, or unless the source disk is known to > be absolutely free from errors. > > Can someone answer my original question which was: > > Could the SB_CLEAN flag be eliminated to reduce the > risk of a resync damaging good data?
Re: reconstruction problem.
> > i have set up an md (raid1) device. it has two hard disks. > > Something has gone bad on the disks, such > that whenever I do a raidstart or mkraid, it > says > raid set md0 not clean. starting background reconstr.. .. > > what can I do to clean my md device. If the raid device isn't stopped correctly it will be dirty and requires a resync the next startup. A reconstruction is also done when the array is initially created. This reconstruction step is actually unnecessary for RAID-1, but is vital for RAID-5; but the raid driver does it anyway. Also, you must allow a resync to completely finish before stopping the array; otherwise, the array will start all over again the next time it is started. > mkraid --really-force also is not helpful . > i have tried destroying both hard disk partitions using fdisk > and doing a clean raid setup, still it starts the background > reconstruction. > > also what is the command to do a low level format of the harddisk > in linux? There is no built in command to format a disk. You can easily zero out the data by doing a 'dd /dev/sda'. If you are using scsi, then there is a scsiformat utility with the scsiinfo package, but this may need fiddling to compile. <>< Lance.
Re: SV: SV: raid5: bug: stripe->bh_new[4]
Johan, Thanks for sending the bulk information about this bug. I have never seen the buffer bug when running local loads, only when using nfs. The bug appears more often when running with 64MB of RAM or less, but has been seen when using more memory. Below is a sample of the errors seen while doing tests. Very interesting is that the same buffer had a problem within 5 minutes with all having different buffers. These all look like potential data corruption since multiple buffers are assigned to the same physical block. I have seen corruption, but the corruption seems to be because of the nfs client, not the server side. Hopefully, this problem will get resolved soon, but it looks like it has been with us for some time now (2 years.) <>< Lance. Mar 1 22:33:10 src@lance-v raid5: bug: stripe->bh_new[2], sector 26272 exists Mar 1 22:33:10 src@lance-v raid5: bh c100b680, bh_new c0594bc0 Mar 1 22:37:32 src@lance-v raid5: bug: stripe->bh_new[2], sector 26272 exists Mar 1 22:37:32 src@lance-v raid5: bh c2d1be60, bh_new c1edcea0 Mar 1 22:42:41 src@lance-v raid5: bug: stripe->bh_new[3], sector 360880 exists Mar 1 22:42:41 src@lance-v raid5: bh c1777840, bh_new c180 Mar 2 03:26:37 src@lance-v raid5: bug: stripe->bh_new[2], sector 1792 exists Mar 2 03:26:37 src@lance-v raid5: bh c0549240, bh_new c0ed30c0 Mar 2 09:07:38 src@lance-v raid5: bug: stripe->bh_new[0], sector 293016 exists Mar 2 09:07:38 src@lance-v raid5: bh c20150c0, bh_new c2015600 Mar 2 14:10:08 src@lance-v raid5: bug: stripe->bh_new[2], sector 42904 exists Mar 2 14:10:08 src@lance-v raid5: bh c084c5c0, bh_new c262b8a0
Re: still get max 12 disks limit
Perhaps if you also modified MAX_REAL in the md_k.h file to 15, it will like more than 12. This value is only used by raid0. <>< Lance. [EMAIL PROTECTED] wrote: > i tried this but mkraid still gives the same error of "a maximum of 12 > disks is supported." > i set MD_SB_DISKS_WORDS to 480 to give me 15 disks. > does anyone have more detailed instructions? > > looking in parser.c i see where that should have worked, but i guess i'm > missing something. it parses down to /dev/hdn which is my 13th disk and > then gives the error, so somehow MD_SB_DISKS is still 12 not 15. weird. > i did a "make install" for raidtools and verified that it updated the > binaries. > > On Tue, 29 Feb 2000, TAKAMURA Seishi wrote: > > > Dear Eldon, > > > > You seem to use RAID0, and I use RAID5, so just FYI. I changed both > > kernel code and raidtool code to increase disk limit. Quick and dirty > > way (which I did) is modify MD_SB_DISKS_WORDS appropriately in the > > following two header files. > > raidtools-0.90/md-int.h > > linux/include/linux/raid/md_p.h > > (MD_SB_DISKS_WORDS/32 = maximum drive number) > > > > With this modification, I am now using an array with 24 disks(1.0TB). > > > > > On Mon, 28 Feb 2000 16:52:16 -0600 (CST) > > > [EMAIL PROTECTED] said: > > > > > > > > > I've been told (by Jakob) that the limit of 12 disks per md device is just > > > a typo in the code. I'm trying to make a 14-drive linear array. I tried > > > changing MAX_REAL from 12 to 14 in md_k.h (and then even recompiled > > > raidtools) but mkraid still complains about the limit being 12. is there > > > any way around this (safely)? > > > > > > ps i also tried nesting 2 md's inside onelocked up the machine, so i > > > don't feel good about that approach. > > > > > > i'm using 2.2.15 > > > > > > Eldon > > > > Seishi Takamura, Dr.Eng. > > NTT Cyber Space Laboratories > > Y922A 1-1 Hikarino-Oka, Yokosuka, Kanagawa, 239-0847 Japan > > Tel: +81-468-59-2371, Fax: +81-468-59-2829 > > E-mail: [EMAIL PROTECTED] > >
Re: [FAQ-answer] Re: soft RAID5 + journalled FS + power failure =problems ?
Ingo, I can fairly regularly generate corruption (data or ext2 filesystem) on a busy RAID-5 by adding a spare drive to a degraded array and letting it build the parity. Could the problem be from the bad (illegal) buffer interactions you mentioned, or are there other areas that need fixing as well? I have been looking into this issue for a long time with no resolve. Since you may be aware of possible problem areas: any ideas, code or encouragement is greatly welcome. <>< Lance. Ingo Molnar wrote: > On Wed, 12 Jan 2000, Gadi Oxman wrote: > > > As far as I know, we took care not to poke into the buffer cache to > > find clean buffers -- in raid5.c, the only code which does a find_buffer() > > is: > > yep, this is still the case. (Sorry Stephen, my bad.) We will have these > problems once we try to eliminate the current copying overhead. > Nevertheless there are bad (illegal) interactions between the RAID code > and the buffer cache, i'm cleaning up this for 2.3 right now. Especially > the reconstruction code is a rathole. Unfortunately blocking > reconstruction if b_count == 0 is not acceptable because several > filesystems (such as ext2fs) keep metadata caches around (eg. the block > group descriptors in the ext2fs case) which have b_count == 1 for a longer > time.
Re: large ide raid system
SCSI works quite well with many devices connected to the same cable. The PCI bus turns out to be the bottleneck with the faster scsi modes, so it doesn't matter how many channels you have. If performance was the issue, but the original poster wasn't interested in performance, multiple channels would improve performance if the slower (single ended) devices are used. <>< Lance Dan Hollis wrote: > Cable length is not so much a pain as the number of cables. Of course with > scsi you want multiple channels anyway for performance, so the situation > is very similar to ide. A cable mess.
Re: Swapping Drives on RAID?
Scott, 1. Use raidhotremove to take out the IDE drive. Example: raidhotremove /dev/md0 /dev/hda5 2. Use raidhotadd to add the SCSI drive. Example: raidhotadd /dev/md0 /dev/sda5 3. Correct your /etc/raidtab file with the changed device. <>< Lance. Scott Patten wrote: > I'm sorry if this is covered somewhere. I couldn't find it. > > 1 - I have a raid1 consisting of 2 drives. For strange > historical reasons one is SCSI and the other IDE. Although > the IDE is fairly fast the SCSI is much faster and since I > now have another SCSI drive to add, I would like to replace > the IDE with the SCSI. Can I unplug the IDE drive, run in > degraded mode, edit the raid.conf and somehow mkraid > without loosing data or do I need to restore from tape. > BYW, I'm using 2.2.13ac1. >
Re: new raid5 says overlapping physical units....
Roland, The messages are not to be feared. To prevent thrashing on a drive between multiple resync processes, the raid resync routine checks to see if any of the disks in the array are already active in another resync. If so, then it waits for the other process to finish before starting. Thus, the resync processes are serialized when a disk is shared between raid arrays. <>< Lance. Roland Roberts wrote: > I "recently" installed stock RedHat 6.1 and configured with root RAID1 > and everything else RAID5. I have 4 U2 LVD SCSI drives on two > controllers. RedHat plays games with the partition layouts when I try > to use its graphical tool, so I ended up partitioning the disks with > fdisk. > > After I first installed RedHat and allowed it to build the RAID > devices from my individual partitions, I got the following disturbing > messages in syslog: > > Dec 17 18:41:30 kernel: md: serializing resync, md5 has overlapping physical units >with md6! > Dec 17 18:41:30 kernel: md: serializing resync, md4 has overlapping physical units >with md6! > Dec 17 18:41:31 kernel: md: serializing resync, md3 has overlapping physical units >with md6!
Re: Adding a spare-disk (continued)
Hi, By the mdstat shown below, you have a 3 drive raid-5 device with one spare. The [0], [1] and [2] indicate the raid role for the associated disks. Values of [3] or higher are the spare (for a three disk array.) In general, in an 'n' disk raid array, [0]..[n-1] are the disks that are in the array with data, and [n]... are the spares, as shown from /proc/mdstat. You are in good shape for the hda2 disk to kick in as the spare if on of the other disks fails. <>< Lance. Johan Ekenberg wrote: > I recently inquired about adding a spare-disk to an operating RAID-5 array, > and was given the advice to use raidhotadd. I've tried this and want to make > sure that the result is the one I should expect. I thought that spare disks > would show up as an "unused device" in /proc/mdstat, but that may not be the > case??? > > This is my mdstat: > Personalities : [linear] [raid0] [raid1] [raid5] > read_ahead 1024 sectors > md0 : active raid5 hda2[3] sdc2[2] sdb2[1] sda2[0] 8305408 blocks level 5, > 32k chunk, algorithm 2 [3/3] [UUU] > unused devices: > > The spare disk in this case is hda2[3], defined as a spare in /etc/raidtab. > Is this the way it should look? Can I be confident that hda2 will kick in if > one of the sd* fails? hda2 is of course formated exactly like the other > partitions.
Re: Help:Raid-5 with 12 HDD now on degrade mode.
Makoto, The normal raid driver only handles 12 disk entries (or slots). Unfortunately, a spare disk counts as another disk slot, and you need a spare slot to rebuild the failed disk. But, with your setup of 12 disk raid 5, you have already defined all the available disk slots. To recover your 12 disk raid 5 system, you will need to modify your kernel and raid tools to accommodate more disks. Fortunately, the reason there is currently a 12 disk limit is from an erroneous calculation, and there is room for many more disks (I don't remember the actual limit, but it is over 24). There has been some talk of this subject in the past. If you look in the list archive for the thread "the 12 disk limit" there is some information on what needs to be done to modify the kernel. This brings up a question though; Can an existing 12 disk limited raid superblock work with a kernel that supports more than 12 disks? I'd think so, since the unused areas are zeroed out. I don't know of anybody trying it though. The tools should, but don't, limit the number of devices in a raid 5 array to one less than the maximum disk slots in the raid superblock so than the last slot can be used as a spare. Unfortunately, you ran into this trap. Good luck, <>< Lance. Makoto Kurokawa wrote: > Hello, All. > > I have a trouble of HDD fail of raid-5,raid-0.90 on Redhat 6.0. > > Raid-5 is now working on degrade mode. > Exactly, Iacan't repair or replace the failed HDD (to new HDD). > Woule you tell me how to do recovery it? > > "/proc/mdstat" is as follows: > > [root@oem /root]# cat /proc/mdstat > Personalities : [raid5] > read_ahead 1024 sectors > md0 : active raid5 sdm1[11] sdl1[10] sdk1[9] sdj1[8] sdi1[7] sdh1[6] sdg1[5] > sdf1[4] sde1[3] sdd1[2] sdc1[1] 97192128 blocks level 5, 4k chunk, algorithm 2 > [12/11] [_UUU] > unused devices: > > "sdb1[0]" is failed, I think. > > "/etc/raidtab" is as follows: > > # Sample raid-5 configuration > raiddev /dev/md0 > raid-level 5 > nr-raid-disks 12 > chunk-size 4 > > # Parity placement algorithm > > #parity-algorithm left-asymmetric > > # > # the best one for maximum performance: > # > parity-algorithmleft-symmetric > > #parity-algorithm right-asymmetric > #parity-algorithm right-symmetric > > # Spare disks for hot reconstruction > #nr-spare-disks 0 > > device /dev/sdb1 > raid-disk 0 > > device /dev/sdc1 > raid-disk 1 > > device /dev/sdd1 > raid-disk 2 > > device /dev/sde1 > raid-disk 3 > > device /dev/sdf1 > raid-disk 4 > > device /dev/sdg1 > raid-disk 5 > > device /dev/sdh1 > raid-disk 6 > > device /dev/sdi1 > raid-disk 7 > > device /dev/sdj1 > raid-disk 8 > > device /dev/sdk1 > raid-disk 9 > > device /dev/sdl1 > raid-disk 10 > > device /dev/sdm1 > raid-disk 11 > > First, I restarted the PC and tryed "raidhotadd" and "raidhotremove" ,the > result is as fllows: > > [root@oem /root]# raidhotadd /dev/md0 /dev/sdb1 > /dev/md0: can not hot-add disk: disk busy! > > [root@oem /root]# raidhotremove /dev/md0 /dev/sdb1 > /dev/md0: can not hot-remove disk: disk not in array! > > Next, I replaced HDD,/dev/sdb to new HDD, the result, system hung-up on boot > time. > > With the message, "/dev/md0 is invalid." > > what should I do to recovery the Raid-5 from degrade-mode to normal mode? > > Makoto Kurokawa > Engineer, OEM Sales Engineering > Storage Products Marketing, Fujisawa, IBM-Japan > Tel:+81-466-45-1441 FAX:+81-466-45-1045 > E-mail:[EMAIL PROTECTED]
Re: kernel SW-RAID implementation questions
There is a constant specifying the maximum number of md devices. But, there is no variable stating how many active md devices are around. This wouldn't make much sense anyway since the md devices are not allocated sequentially. You can start with md3, for example. You can have a program analyze the /proc/mdstat file to see what md device numbers are currently active and thus not available for new devices. <>< Lance. Thomas Waldmann wrote: > > Is there a variable containing the md device count (md0, md1, ..., mdn. n == ?) > ?
Re: Tuning readahead
Attached is a program that will let you get or set the read ahead value for any major device. You can easily change the value and then do a performance test. <>< Lance. Jakob Østergaard wrote: > > Hi all ! > > I was looking into tuning the readahead done on disks in a RAID. > It seems as though (from md.c) that levels 0, 4 and 5 are handled > in similar ways. > > The readahead is set to chunk_size*4 per disk, and then increased > to 1024*MAX_SECTORS = 1024*128 = 128k if the above equation yielded > a result lower than this. > > So besides from changing the chunk size to something bigger, is there > any way the readahead can be tuned ? Should (and could I safely) just > change the equation in md.c ? readahead.c
Re: Uping the limit of drives in a single raid.
Jakob Østergaard wrote: > IIRC the 12 disk limit is a ``feature''. Actually you can have up to 15 disks. Simply > grep for the 12 disk constant in the raidtools and flip it up to 15. You can't go > further than that though. I forget the exact number, but Ingo said that the drive count can be changed to (23-28?) -- somewhere in there, and that change may precede the 250 disk limit. <>< Lance.
Re: Slower read access on RAID-1 than regular partition
Optimizing the md driver for Bonnie, IMHO, is foolishness. Bonnie is a sequential read/write test and does not produce numbers that mean much in typical data access patterns. Example: the read_ahead value is bumped way up (1024), this kills performance when doing more normal accesses. Linux's average contiguous data area request size is much smaller than 512kb. Yes, this makes Bonnie look better, but not a real working system. It is nice to have high Bonnie results, but not at the expense of a working system. I wish I knew of a more statistically oriented data access test like Netbench, but on the server side. The reason Bonnie is so popular is that it is easy (and cheap.) In the Raid1 case. A Bonnie test will not highlight the advantages of read balancing. Someone can do tune the chunk size to work best with Bonnie. But the best chunk size for a Raid1 test on Bonnie will most likely be a bad choice for a normal operating system. Please don't think that Bonnie result always mean much. They are fun to compare, but be careful in how the numbers are interpreted. <>< Lance. [EMAIL PROTECTED] wrote: > > On Wed, 15 Sep 1999, James Manning wrote: > > > > -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- > > > Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU > > > md0 192 5933 86.4 15222 21.8 4172 11.8 5672 81.3 9014 11.2 218.4 4.6 > > > sd0 192 6411 92.0 15072 18.5 4265 11.7 5760 80.6 12069 13.1 201.8 4.5 > > > > More cases with faster write access (significantly) than read... am I > > wrong in thinking this is strange? Is bonnie really worth trusting? > > Is there a better tool currently available? > > bonnie is the main benchmark i'm optimizing for. hdparm -tT is rather > useless in this regard, it has only a relevance on maybe e2fsck times. > > i'll have a look at RAID1 read balancing. I once ensured we read better > than single-disk, but we might have lost this property meanwhile ... > > -- mingo
Re: the 12 disk limit
Lawrence, If you don't care about being 'standard', There is plenty of fluff in the superblock to make room for more disks. I don't know how well behaved all the tools are at using the symbolic constants though. To Support 18 devices, you will need to allow at least 19 disks (one for the spare/replacement), but I like using even numbers, so round it up to 20. You'd have to change. In linux/include/linux/raid/md_k.h #define MAX_REAL 20 Change lines in linux/include/linux/raid/md_p.h so it reads something like... #define MD_SB_DESCRIPTOR_WORDS 32 #define MD_SB_DISKS 20 #define MD_SB_DISK_WORDS (MD_SB_DESCRIPTOR_WORDS * MD_SB_DISKS) These have to be above the line with MD_SB_RESERVED_WORDS. NOTE: the md driver primarily uses MD_SB_DISKS for the max number of disk count. The MAX_REAL value is also used (twice), but it could have just as well used the MD_SB_DISKS value. Oh well. And, recompile--both the kernel and the tools. Try it out, let us know if the tools work. <>< Lance. Lawrence Dickson wrote: > > All, >I guess this has been asked before, but - when will the RAID > code get past the 12 disk limit? We'd even be willing to use > a variant - our customer wants 18 disk RAID-5 real bad. >Larry Dickson >Land-5 Corporation
Re: Why RAID1 half-speed?
Hi Mike, You are using a very small chunk size. Increase this number to 128. I think you may need to remake the array though. This is kind of silly since in RAID-1, the data isn't laid out any differently for different chunk sizes as other raid personalities are. It would be nice to be able to just edit the radtab file and have it automagically change. The significance of the chunk size is this. The RAID-1 personality has a read balancing mechanism that tries to use the same drive so long as the requests are sequential and not bigger than the chunk size. With a chunk size of 4, the raid driver is breaking the read requests into 4KB chunks, then switching to the next disk which is hardly optimal. A value of 128 is much better. For bonnie tests, the larger the better, but bonnie is not a real world test. I find 128 a good compromise. <>< Lance. Mike Black wrote: > > I'm a little confused on RAID1...running 2.2.11 with > raid0145-19990824-2.2.11.bz2 on a PII/233 > > I just set up a mirror this weekend on an IDE RAID1 - two 5G disks on the > same IDE bus (primary and master). > > I was under the impression that I shouldn't see any slowdown and maybe even > a speedup but, alas, it is not so. > > Here's the hparm test (ran several times -- similar results each time): > > /dev/hda: > Timing buffer-cache reads: 64 MB in 0.95 seconds =67.37 MB/sec > Timing buffered disk reads: 32 MB in 3.28 seconds = 9.76 MB/sec > > /dev/md0: > Timing buffer-cache reads: 64 MB in 0.85 seconds =75.29 MB/sec > Timing buffered disk reads: 32 MB in 6.10 seconds = 5.25 MB/sec > > It looks like I've lost half of the bandwidth on disk reads. Did I miss > something?? Here's the raidtab entry: > > raiddev /dev/md0 > raid-level1 > nr-raid-disks 2 > nr-spare-disks0 > persistent-superblock 1 > chunk-size4 > > device/dev/hda1 > raid-disk 0 > device/dev/hdb1 > raid-disk 1 > > > Michael D. Black Principal Engineer > [EMAIL PROTECTED] 407-676-2923,x203 > http://www.csi.cc Computer Science Innovations > http://www.csi.cc/~mike My home page > FAX 407-676-2355
Re: seeking advice for linux raid config
James, There are currently 128 possible SCSI disk device allocated in the device map--see linux/Documentation/devices.txt . Now, each of these supports partitions 1..15 (lower 4 bits) with 0 being the raw device, and the other bits for the base device are mapped into various places. There is a slight chance of modifying things so that you have less partition bits and give those unused bits to the base scsi devices. I don't know how well disciplined the scsi code is in using the conversion macros from device and partition to device number. You have another problem with the md driver (raid). It's superblock is coded to allow raid sets of up to 11 devices (12 if you count the spare.) This is a #define set to 12. You should be able increase this value to 16 and recompile the kernel and tools. I have heard of a large file patch for ext2 filesystem that you may be able to use. FYI: the ext2 filesystem is limited to 1-Terabyte maximum per volume. <>< Lance. [EMAIL PROTECTED] wrote: > > > The Software RAID solution will give you all the flexibility you need. > > If you have already considered it, and discarded it as an option for > > some reason, I'd be grateful to know about that reason. > > The 16-scsi-drive limitation that existed (at least at one time). > While the limit may be higher now, being over 240 (ideally 256 minimum > seems unlikely (would require 16 device major's afaict, at least with > the current partition/minor config). If this limitation is gone, I > would *love* to do pure s/w raid, that's for sure... > > James > -- > Miscellaneous Engineer --- IBM Netfinity Performance Development
Re: RAID-0 Slowness
Mark, Having a very large chunk size would reduce the performance down close to that of a single device. Two performance factors to keep in mind: access time, and throughput. Access time is important for the many small files and accesses needed, and throughput is needed for large requests. Mixed in with these factors is request overhead latency, the average seek/access time, the sustained throughput of a single device, and the size of the devices cache buffer. In setting the chunk size, I suppose there may be two schools of thought. First, have the chunk size so one spindle can handle the entire request--freeing the other spindles to work on some other areas (this increases access time), or secondly, to have all the spindles working in parallel (this increase throughput). And a third strategy, setting the chunk size to work well with both large and small requests. On a typical system, most requests (and files) are small ( < 4KB), but there are many larger requests ( > 256KB) that load in object code. I suggest a chunk size of around 64(KB) since it allows a greater access time when using small requests, and also adds increase throughput for larger requests (by sharing the requests). 128KB may work just as well, but this exceeds the size of some cache buffers and some device drivers cannot request more than 64KB in one request. My two cents worth. <>< Lance. Marc Mutz wrote: > D. Lance Robinson wrote: > > > > Try bumping your chunk-size up. I usually use 64. When this number is low, > > you cause more scsi requests to be performed than needed. If really big ( > > >=256 ) RAID 0 won't help much. > > > What if the chunk size matches ext2fs's group size (i.e. 8M)? This would > give very good read/write performance with moderatly large files (i.e. > <8M) if multiple processes do access the fs, because ext2fs usually > tries to store a file completely within one block group. The performance > gain would be n-fold, if n was the number of disks in the raid0 array > and the number of processes was higher than that. > It would give only single-speed (so to speak) for any given application, > though. > But then: Wouldn't linear append be essentially the same, given that > ext2fs spreads files all across the block groups from the beginning? > > Would that not be the perfect setup for a web server's documents volume, > with MinServers==n? The files are usually small and there are usually > much more than n servers running simultaneously. > > Is this analysis correct or does it contain flaws? > What be the difference between raid0 with 8M chunks and linear append? > > Just my thoughts wandering off... > > Marc
Re: RAID-0 Slowness
Try bumping your chunk-size up. I usually use 64. When this number is low, you cause more scsi requests to be performed than needed. If really big ( >=256 ) RAID 0 won't help much. <>< Lance. Richard Schroeder wrote: > Help, > I have set up RAID-0 on my Linux Redhat 6.0. I am using RAID-0 > (striping) with two IDE disks (each disk on it's own IDE controller). > No problems in getting it running. However, my tests show I/O > performance seems to be worse than on a "normal" non-RAID filesystem. I > have tried different chunk-sizes to no avail. I must be missing > something. Shouldn't I be seeing a slight performance gain? > > Here is my /etc/raidtab: > raiddev /dev/md0 > raid-level 0 > nr-raid-disks 2 > nr-spare-disks 0 > chunk-size 4 > persistent-superblock 1 > device /dev/hda8 > raid-disk 0 > device /dev/hdc8 > raid-disk 1 > > Curious > > Richard Schroeder > [EMAIL PROTECTED]
Re: What hardware do you recommend for raid?
Hi Lucio, Lucio Godoy wrote: > > The idea of using raid is to add more disks onto the scsi > controler (Hot adding ?) when needed and combine the newly > added disk to the previous disks as one physical device. > > Is it possible to add another disk without having to switch of the > machine? There are special disk enclosures that allow you to add new scsi disks into the drive bays without turning the power off. HOWEVER, the RAID device driver does not allow you to add a disk to enlarge the raid device's size. Hot adding is only used for replacement of a faulty device. > is it possible to combine that newly added disk to the previous physical > device? Not to make it bigger as stated above. If you want to enlarge a device using RAID level 0, 4 or 5, you will need to: * backup your data. * verify your backup is okay. * add the disk. * create a new RAID device (mkraid) * restore your backup. <>< Lance.
Re: How to read /proc/mdstat
To identify the spare devices through /proc/mdstat... 1) Look for the [#/#] value on a line. The first number is the number of a complete raid device as defined. Lets say it is 'n'. 2) The raid role numbers [#] following each device indicate its role, or function, within the raid set. Any device with 'n' or higher are spare disks. 0,1,..,n-1 are for the working array. Also, if you have a failure, the failed device will be marked with (F) after the [#]. The spare that replaces this device will be the device with the lowest role number n or higher that is not marked (F). Once the resync operation is complete, the device's role numbers are swapped. Don't count on the order in which the devices appear in the /proc/mdstat output. <>< Lance. Osma Ahvenlampi wrote: > > This is the /proc/mdstat output on a particular kernel 2.0.36 + > raid0145-19990421 system equipped with six SCSI disks, configured as > (multiple) 5-disk RAID-5 plus one hot spare disk. However, it's not > immediately obvious to me from the output WHICH of the disks is the > spare (I know that it's /dev/sdf, since that's the one I added as > spare after creating the array with no spare disk, but what if I > didn't know that?). > > My motivation to ask this is actually so that I might be able to > decide whether I could tell the spare disk to spin down, since it's > not it use. No point having it spinning wearing itself down when the > point of it is to work in case one of the others fail. > > # cat /proc/mdstat > Personalities : [raid1] [raid5] [translucent] > read_ahead 1024 sectors > md0 : active raid1 sdb2[1] sda2[0] 64192 blocks [2/2] [UU] > md1 : active raid5 sdf5[5] sde5[4] sdd5[3] sdc5[2] sdb5[1] sda5[0] 706304 blocks >level 5, 32k chunk, algorithm 2 [5/5] [U] > md2 : active raid5 sdf6[5] sde6[4] sdd6[3] sdc6[2] sdb6[1] sda6[0] 1959424 blocks >level 5, 32k chunk, algorithm 2 [5/5] [U] > md3 : active raid5 sdf7[5] sde7[4] sdd7[3] sdc7[2] sdb7[1] sda7[0] 1959424 blocks >level 5, 32k chunk, algorithm 2 [5/5] [U] > md4 : active raid5 sdf8[5] sde8[4] sdd8[3] sdc8[2] sdb8[1] sda8[0] 30587136 blocks >level 5, 32k chunk, algorithm 2 [5/5] [U] > unused devices: > > -- > Osma Ahvenlampi
Re: raid1 on ide decreases read performance
Don't start to think that Bonnie gives real world performance numbers. It gives single tasking sequential access throughput values. Sure Bonnie's numbers have some value, but don't think that its results match typical system access patterns. The performance difference with Raid-1 is seen when doing several io bound tasks simultaneously. Bonnie doesn't come close to doing this. <>< Lance. [EMAIL PROTECTED] wrote: > Yes, I guess you're right that the way raid-1 stripes the reads doesn't > necessarily yield higher read performance after all... Here's a little > test I did: > > raid-0 on two disks: > ---Sequential Output ---Sequential Input-- --Random-- > -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- > MachineMB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU > 900 6160 97.1 21710 73.3 8559 52.5 7841 94.2 23977 63.9 157.3 5.5 > > raid-1 on the same disks: > ---Sequential Output ---Sequential Input-- --Random-- > -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- > MachineMB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU > 470 5801 94.6 11719 39.5 5264 32.5 6931 83.0 11861 34.8 167.4 4.7 > > Hmm I know that raid-1 does distribute the reads to both disks, so I would > think that read-performance should increase. But it seems like it doesn't. At > least not in this case. Btw. the disks where on separate SCSI controllers. >
Re: raid1 on ide decreases read performance
> > > > The bottom line: Read performance for a RAID-1 device is better than a > > single (JBOD) device. The bigger the n in n-way mirroring gives better > > read performance, but slightly worse write performance. > > > But using n-way mirrors will also increase cpu utilization during reads > - > or am I wrong? - because of the cycling process. CPU utilization is not increased for reading by higher n's in n-way mirroring. Only one device is asked for the data. The overhead for the balancing is small. If cpu utilization goes up while reading, it is because your throughput is higher :-) Memory bus utilization is increased (thus increase in CPU utilization) for writing more n's in n-way mirroring. This is because the data is duplicated across the memory bus n times. This is true with IDE and SCSI. <>< Lance.
Re: raid1 on ide decreases read performance
Osma, RAID-1 does read balancing which may(?) be better than striping. Each read request is checked against the previous request, if it is contiguious with the previous request, it uses the same device, otherwise it switches to the next mirror. This process cycles through the mirrors (n-way mirrors.) The bottom line: Read performance for a RAID-1 device is better than a single (JBOD) device. The bigger the n in n-way mirroring gives better read performance, but slightly worse write performance. <>< Lance. Osma Ahvenlampi wrote: > > Dietmar Stein <[EMAIL PROTECTED]> writes: > > Readperformance will only increase by using raid0 (stripe), but it will > > not be twice times faster. > > Does the Linux RAID-1 code still not stripe reads? I thought it did. > > -- > Osma Ahvenlampi
Re: Add expansion of exisiting RAID 5 config in software RAID?
The answer is still the same (May 1999). <>< Lance. Scott Smyth wrote: > > I would like to explore the requirements of expanding > RAID 0,4, and 5 levels from an existing configuration. > For example, if you have 3 disks in a RAID 5 configuration, > you currently cannot add a disk to the RAID 5 without > destructively remaking the RAID 5 configuration and reformatting > the multiple block device upon completion. Is anyone > working on (I remember it mentioned previously on the list) > what has been called "resize array" in the software RAID > howto in the wish list section. > > from RAID 5 FAQ: > >2.Q: Can I add disks to a RAID-5 array? > > A: Currently, (September 1997) no, not without erasing all data. > A conversion utility to allow this does not yet exist. The problem > is that the actual structure and layout of a RAID-5 array depends > on the number of disks in the array. Of course, one can add > drives by backing up the array to tape, deleting all data, creating > a new array, and restoring from tape. > > thanks, > Scott
Fix for /proc/mdstat & raidstop panic
Hi all, Attached is a fix for a problem that happens when /proc/mdstat is read when a raid device is being stopped. A panic could result. Not many users are reading /proc/mdstat much or stopping a raid device manually, but this problem caused us many headaches. The problem happens something like... 1) raidstop is run 2) raidstop process removes superblock structure from raid device structure before being removed from the list of raid devices. 3) /proc/mdstat starts reading raid device structures and tries to read the superblock data that doesn't exist. 4) panic. Solution: * Added a new semaphore that protects the all_mddevs list. * Added lock and unlock code around each reference to the list. * Needed to fix some other related semaphore use. * Modified md_status so it will check for null sb ptr. if found, a message like the following is given: md1 : inactive sb Note: I could very quickly run into the problem before (in about 5 seconds.) I ran a few scripts to test the routine out and it started and stopped three independent raid arrays while another script just read /proc/mdstat. This ran for over 50,000 total start/stop cycles. For some reason, one of the raid devices got in the 'D' state. This seems unrelated to the given fix since down_interruptible is being used. Back to other things... <>< Lance. --- linux-r16a/drivers/block/md.c Tue May 11 00:05:30 1999 +++ linux/drivers/block/md.cThu Apr 29 23:23:53 1999 @@ -162,6 +162,18 @@ */ static MD_LIST_HEAD(all_mddevs); +/* + * The all_mddevs_sem must be taken before modifying the all_mddevs list. + * It should only be needed when either adding or removing an mddev. + * You must NOT have any mddev->reconfig_sem locked while locking + * this semaphore. + */ +static struct semaphore all_mddevs_sem = MUTEX; + +/* + * Allocates an mddev structure. + * Returns: the pointer to the mddev_t structure which is locked. + */ static mddev_t * alloc_mddev (kdev_t dev) { mddev_t * mddev; @@ -186,9 +198,13 @@ * personalities can create additional mddevs * if necessary. */ + lock_all_mddevs(); + lock_mddev( mddev ); add_mddev_mapping(mddev, dev, 0); md_list_add(&mddev->all_mddevs, &all_mddevs); + unlock_all_mddevs(); + /* NOTE: this mddev is still locked! */ return mddev; } @@ -208,9 +224,14 @@ while (md_atomic_read(&mddev->recovery_sem.count) != 1) schedule(); + unlock_mddev( mddev ); + lock_all_mddevs(); /* lock the list */ + lock_mddev( mddev );/* Just in case we got blocked for all. */ + del_mddev_mapping(mddev, MKDEV(MD_MAJOR, mdidx(mddev))); md_list_del(&mddev->all_mddevs); MD_INIT_LIST_HEAD(&mddev->all_mddevs); + unlock_all_mddevs(); kfree(mddev); } @@ -1878,6 +1899,7 @@ md_list_del(&rdev->pending); MD_INIT_LIST_HEAD(&rdev->pending); } + unlock_mddev(mddev); autorun_array(mddev); } printk("... autorun DONE.\n"); @@ -2556,14 +2586,6 @@ err = -ENOMEM; goto abort; } - /* -* alloc_mddev() should possibly self-lock. -*/ - err = lock_mddev(mddev); - if (err) { - printk("ioctl, reason %d, cmd %d\n",err, cmd); - goto abort; - } err = set_array_info(mddev, (void *)arg); goto done_unlock; @@ -3189,6 +3221,7 @@ } if (!mddev->pers) { + unlock_mddev( mddev ); sz += sprintf(page+sz, "\n"); continue; } @@ -3201,9 +3234,12 @@ if (md_atomic_read(&mddev->resync_sem.count) != 1) sz += sprintf(page + sz, " resync=DELAYED"); } + unlock_mddev( mddev ); sz += sprintf(page + sz, "\n"); } sz += status_unused (page + sz); + + unlock_all_mddevs(); return (sz); } --- linux-r16a/include/linux/raid/md_k.hTue May 11 00:05:30 1999 +++ linux/include/linux/raid/md_k.h Thu Apr 29 23:23:52 1999 @@ -294,6 +294,13 @@ ITERATE_RDEV_GENERIC(pending_raid_disks,pending,rdev,tmp) /* + * It would be better for these to be inline, but all_mddevs_sem is static. + * This is the locking mechanism for the all_mddevs list. + */ +#define lock_all_mddevs() down_interruptible( &all_mddevs_sem ) +#define unlock_all_mddevs()up( &all_mddevs_sem ) + +/* * iterates through all used mddevs in the system. */ #define ITERATE_MDDEV(mddev,tmp) \
Re: Swap on raid
Hi, You can run a system without a swap device. But if you do 'swapoff -a' _after_ a swap device failure, you are dead (if swap had any virtual data stored in it.) 'swapoff -a' copies virtual data stored in the swap device to physical memory before closing the device. This is much different than losing access to the swap data due to a failure. <>< Lance. [EMAIL PROTECTED] wrote: > > Hm, > > I understand the necessary of redundancy; but isn't it the same > if you do a swapoff -a or swap-disks dies on a system? > What I have in mind is the thing, that the system should not swap > at all, so that it is necessary to have as much memory (RAM) as > possible.
system panic when reading /proc/mdstat while doing raidstop.
There seems to be a major problem when reading /proc/mdstat while a raid set is being stopped. This rarely conflict will very rarely be seen, but I have a daemon that monitors /proc/mdstat every two seconds and once in a while the system panics when doing testing. While running the script below which pounds on (reading) /proc/mdstat for about a second, then backs off for a second. I got a panic after 3 start/stop cycles and it happened when doing the stop. Also, doing this exercise, there is something else interesting. Without the 'sleep 1', the raid driver will not resync or stop. I would think that a window of time would eventually open up, but after about a minute of waiting, still nothing happened. Another thing. Reading the /proc/mdstat seems to be relatively slow. It takes over a second to read it 100 times. Perhaps this delay is also in the script processing. Any comments or fixes :-) are appreciated. <>< Lance. #--START OF SCRIPT--- #!/bin/bash count=0 icount=0 while [ 1 ]; do cp /proc/mdstat /dev/null let count=count+1 let icount=icount+1 if [ $icount = 100 ]; then cat /proc/mdstat sleep 1 echo $count icount=0 fi done #--END OF SCRIPT---
Re: RAID+devfs patch for new kernel?
Hi Steve, I made the patches that are on Richard's site for raid+devfs. Unfortunately, I was having too many problems with devfs on my PowerPC sustem and had to solve problems without devfs. I still have a patch file that I used to help create the raid+devfs patch. I don't know if it fixes all the devfs patch problems for the current versions, but if someone else wants to try, I'd be glad to give some simple instructions. <>< Lance. Steve Costaras wrote: > > Does anyone know, or is anyone working on a new combined patch for > the kernel (2.2.6 or 2.2.7) for both RAID & devfs? The last one I've seen > is for 2.2.3..
Memory buffer corruption with Raid on PPC
I have linux 2.2.3 with raid014519990309.. patch. On a PPC (Mac G3) system, I am getting what seems to be memory buffer courruption when using raidstart. The same kernel source run with i386 architecture seems to be fine. To show the problem, I do something like the following... # cd ~me # gcc source_a.c # raidstart /dev/md0 ; gcc source_a.c # raidstop /dev/md0 ; gcc source_a.c # raidstart /dev/md0 ; gcc source_a.c # raidstop /dev/md0 ; gcc source_a.c # raidstart /dev/md0 ; gcc source_a.c # raidstop /dev/md0 ; gcc source_a.c # raidstart /dev/md0 ; gcc source_a.c # raidstop /dev/md0 ; gcc source_a.c Usually doing this will cause the compile to have a problem. Such as: "Illegal Instruction", or some compile error. This indicates some sort of memory buffer corruption since the gcc is all done out of memory. The problem seems to be in the raidstart area. It shows up after starting the array, but once the array is started (and passes the gcc test,) it works fine. Any ideas ? Thanks, <>< Lance.
Re: Day 7 and still no satisfaction
Carl, The 2.2.4 kernel does not have the latest raid code. But, the raid patches do not yet cleanly apply to the 2.2.4 kernel. I suggest you start with the 2.2.3 kernel, apply the appropriate raid patches (raid0145-19990309-2_2_3.gz), and get the latest raidtools (raidtools-19990309-0_90_tar.gz). The best, but not great, documentation comes with the raidtools. <>< Lance. > Carl Hilinski wrote: > > I am quickly reaching the end of the rope. I wanted to learn about > RAID in linux (having used it much in NT), so I tried to patch Redhat > 2.0.36 with the 0145 raid patch, which simply returned "X not set" > messages in defconfig.rej. Since I could find no info on what to do to > solve that, I upgraded to kernel 2.2.4 (which I assume doesn't need > the 0145 patch since it has the "personalities" and raid 1 and 5 can > be selected in the make config). So I set up a 100+mb partition as > hda5 and a 100+mb partition on hdb1(both of which were configured > under the original Redhat 5.2 install), umounted them, set up the > raidtab to say use Raid 1 with the /dev/hda5 and /dev/hdb1 partitions, > no spares and the persistent superblock 1 value. When I do a > mkraid --really-force /dev/md1, I get the message: > disk 0: /dev/hdb1 166129kb, raid superblock at 166016kb > disk 1: /dev/hda5 167296kb, raid superblock at 167232kb > mkraid: aborted > > What happened? I've spent days and days on trying to make this work (I > had to install a WinNT server because I had a deadline and couldn't > make this work). What did I do wrong? And how would I know? There's no > docs on what happens when it all goes wrong. > > ch
Re: Filesystem corruption (was: Re: Linux 2.2.4 & RAID - success report)
I have also experienced file system corruption with 2.2.4. The problem most likely lies in the /fs/buffer.c file which the raid patch had a conflict with. <>< Lance. Tony Wildish wrote: > this sound to me like bad memory. I had a very similar problem recently > and it was a bad SIMM. I was lucky enough to have four SIMMS in the > machine so I can still run with only two, having removed the bad SIMM and > its partner > > On Mon, 29 Mar 1999, Richard Jones wrote: > > > Not so fast there :-) > > > > In the stress tests, I've encountered almost silent > > filesystem corruption. The filesystem reports errors > > as attached below, but the file operations continue > > without error, corrupting files in the process. At > > no time did the RAID software report any problem, nor > > did any reconstruction kick in. > > > > Anyone have any ideas what might be going on? It doesn't > > seem to be exclusively a 2.2.4 thing. I've seen similar > > problems with 2.0.36-19990128.
raid5: md0: unrecoverable I/O error for block x
Hi, If I "scsi remove-single-device" two devices from a RAID5, I would expect the RAID device to eventually fail itself. But it seems to be in some sort of loop spitting out raid5: md0: unrecoverable I/O error for block Where seems to be cyclic. Top shows that raid5d is taking 99% of the cpu. Note: the device was busy with activity when I logically removed the n-2nd device. <>< Lance.
read_ahead in md driver.
Hi, I have noticed that the read_ahead value is set to 1024 in the md driver. Why is this value so large? I would think a value of 128 or so would be more appropriate. <>< Lance.
md: bug in file raid5.c, line 666 (line of raid5_error code)
Hi, I am doing some tests with raid. I will probably have more posts on other situations, but here is a situation that causes raid problems... scenario: 1) mkraid /dev/md/0# raid5 three drive, no spare (using devfs) 2) Wait for resync to complete 3) Disable one of the drives. 4) mke2fs /dev/md/0 Environment: PC + Linux 2.2.2pre2, + raid 19990128 patch + devfs + (out of memory patch) + sym53c8xx scsi driver. The mke2fs process starts queuing *many* scsi requests before the first request fails on the crippled device. Each of the queued scsi requests fails and starts a scsi bus reset cycle and the raid driver spits out its * * with other things and somewhere in there is a program bug message. The scsi reset cycle is over a second and there were perhaps 150 or more queued items. It took a while before the system gave up on the mkfs process. After that, things worked okay in degraded mode. Can't the raid driver de-queue any requests it has for a device it has marked bad? In my case, the raid driver apparently re-issued the device requests for the same blocks to the other good drives. This eventually ran out of memory which terminated the mke2fs. I don't mind this happening to mkfs, but it may happen to something else much more critical. Note: since I have devfs in raid5.c, the line number 666 (I didn't make that up,) is probably different than the standard code. The message is coming from a MD_BUG message within the raid5_error() routine. <>< Lance. Log of some of the bad activity Feb 15 14:21:02 myk6 kernel: ncr53c895-0-<1,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 15) Feb 15 14:21:02 myk6 kernel: ncr53c895-0-<6,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) Feb 15 14:21:04 myk6 kernel: scsi0 channel 0 : resetting for second half of retries. Feb 15 14:21:04 myk6 kernel: SCSI bus is being reset for host 0 channel 0. Feb 15 14:21:01 myk6 kernel: scsidisk I/O error: dev 08:01, sector 88 Feb 15 14:21:01 myk6 kernel: md: bug in file raid5.c, line 666 Feb 15 14:21:01 myk6 kernel: Feb 15 14:21:01 myk6 kernel:** Feb 15 14:21:01 myk6 kernel:* * Feb 15 14:21:01 myk6 kernel:** Feb 15 14:21:01 myk6 kernel: md0: array superblock: Feb 15 14:21:01 myk6 kernel: SB: (V:0.90.0) ID: CT:36c88867 Feb 15 14:21:01 myk6 kernel: L5 S04440832 ND:3 RD:3 md0 LO:0 CS:32768 Feb 15 14:21:01 myk6 kernel: UT:36c88f64 ST:0 AD:2 WD:2 FD:1 SD:0 CSUM:14c414c6 E:0008 Feb 15 14:21:01 myk6 kernel: D 0: DISK Feb 15 14:21:01 myk6 kernel: D 1: DISK Feb 15 14:21:01 myk6 kernel: D 2: DISK Feb 15 14:21:01 myk6 kernel: D 3: DISK Feb 15 14:21:01 myk6 kernel: D 4: DISK Feb 15 14:21:01 myk6 kernel: D 5: DISK Feb 15 14:21:01 myk6 kernel: D 6: DISK Feb 15 14:21:01 myk6 kernel: D 7: DISK Feb 15 14:21:01 myk6 kernel: D 8: DISK Feb 15 14:21:01 myk6 kernel: D 9: DISK Feb 15 14:21:01 myk6 kernel: D 10: DISK Feb 15 14:21:01 myk6 kernel: D 11: DISK Feb 15 14:21:01 myk6 kernel: THIS: DISK Feb 15 14:21:01 myk6 kernel: rdev sd/c0b0t0u0p1: O:sd/c0b0t0u0p1, SZ: F:1 DN:0 no rdev sup erblock! Feb 15 14:21:01 myk6 kernel: rdev sd/c0b0t6u0p1: O:sd/c0b0t6u0p1, SZ:0032 F:0 DN:2 rdev superb lock: Feb 15 14:21:01 myk6 kernel: SB: (V:0.90.0) ID: CT:36c88867 Feb 15 14:21:01 myk6 kernel: L5 S04440832 ND:3 RD:3 md0 LO:0 CS:32768 Feb 15 14:21:01 myk6 kernel: UT:36c88f64 ST:0 AD:2 WD:2 FD:1 SD:0 CSUM:4b8ca45b E:0008 Feb 15 14:21:01 myk6 kernel: D 0: DISK Feb 15 14:21:01 myk6 kernel: D 1: DISK Feb 15 14:21:01 myk6 kernel: D 2: DISK Feb 15 14:21:01 myk6 kernel: D 3: DISK Feb 15 14:21:01 myk6 kernel: D 4: DISK Feb 15 14:21:01 myk6 kernel: D 5: DISK Feb 15 14:21:01 myk6 kernel: D 6: DISK Feb 15 14:21:01 myk6 kernel: D 7: DISK Feb 15 14:21:01 myk6 kernel: D 8: DISK Feb 15 14:21:01 myk6 kernel: D 9: DISK Feb 15 14:21:01 myk6 kernel: D 10: DISK Feb 15 14:21:01 myk6 kernel: D 11: DISK Feb 15 14:21:01 myk6 kernel: THIS: DISK Feb 15 14:21:01 myk6 kernel: rdev sd/c0b0t1u0p1: O:sd/c0b0t1u0p1, SZ:0032 F:0 DN:1 rdev superb lock: Feb 15 14:21:01 myk6 kernel: SB: (V:0.90.0) ID: CT:36c88867 Feb 15 14:21:01 myk6 kernel: L5 S04440832 ND:3 RD:3 md0 LO:0 CS:32768 Feb 15 14:21:01 myk6 kernel: UT:36c88f64 ST:0 AD:2 WD:2 FD:1 SD:0 CSUM:4b8ca449 E:0008 Feb 15 14:21:01 myk6 kernel: D 0: DISK Feb 15 14:21:01 myk6 kernel: D 1: DISK Feb 15 14:21:01 myk6 kernel: D 2: DISK Feb 15 14:21:01 myk6 kernel: D 3: DISK Feb 15 14:21:01 myk6 kernel: D 4: DISK Feb 15 14:21:01 myk6 kernel: D 5: DISK Feb 15 14:21:01 myk6 kernel: D 6: DISK Feb 15 14:21:01 myk6 kernel: D 7: DISK Feb 15 14:21:01 myk6 kernel: D 8: DISK Feb 15 14:21:01 myk6 kernel: D 9: DISK Feb 15 14:21:01 myk6 kernel: D 10: D
Re: disconnecting live disks
steve rader wrote: > > Some eec person once told me that disconnecting live molex > (power) scsi connectors can kill a disk drive. And I'm also > not confortable futzing with scsi connectors on live busses. > > I assume the perferred method is to put the disk-to-kill > on a external power supply with a power switch. > > Is there a safe way without an external power supply? > Maybe you could rig a switch in a drive power cable to kill the 12volt line. Or you could make a power cable extention with a switch in it. Then you could remove it once done testing. Killing the 12volt line will effectively break the drive. It would be interesting to see how various drives handle their error reporting. I have a power splitter cable, I think I'll put a switch in it and see what happens. <>< Lance.
Re: [BUG] v2.2.0 heavy writing at raid5 array kills processes
Markus Linnala wrote: > > v2.2.0 heavy writing at raid5 array kills processes randomly, including init. > > Normal user can force random processes to out of memory > situation when writing stuff at raid5 array. This makes the raid > > I get 'Out of memory for init. ' etc. with following simple command: > > dd if=/dev/zero of=file > I am also getting "Out of memory for .." when trying to mke2fs. I originally thought this was limited to a PowerPC, but I moved my raid set to a PC and I get the same results. My setup is a raid5 with three Ultra2 drives with 4GB each. I am using a Symbios 53c895 chip with the alpha sym53c8xx driver. Using linux 2.2.1 and lates 012899 raid patches. It seems as though the processor is outpacing the i/o and using up the 64MB of memory for buffers. I am curious what configurations out there do work. Or maybe it would be better to know which ones don't so it can get fixed. <>< Lance.
Re: Physical device tracking....
James, First of all, you probably want to reboot. This will rename your devices to their typical values. To add a device into a failed raid slot, you can use the raidhotadd command. do something like: raidhotadd /dev/md0 /dev/hdc2 This will add the device to the raid set and start a resync operation. BTW: I hope you are only trying raid out with the setup you have shown below. Using the same device more than once in a raid set is: 1)slow, and 2)does not protect your data. I hope this helps some. I may be off target in what you have done and what you want to do. <>< Lance. A James Lewis wrote: > After testing various failure conditions, I seem to be stuck because the > system allocated new disk numbers to the disks > > RAID1 conf printout: > --- wd:1 rd:2 nd:3 > disk 0, s:0, o:1, n:0 rd:0 us:1 dev:hdb1 > disk 1, s:0, o:0, n:1 rd:1 us:1 dev:[dev 00:00] > disk 2, s:1, o:0, n:2 rd:2 us:1 dev:hdb3 > disk 3, s:1, o:0, n:3 rd:3 us:1 dev:hdb2 > disk 4, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00] > > I need to get "disk 1" back to the correct device is there a way to do > this? Perhaps there is a FAQ, but I guess not since this is so new >
Re: [BUG] v2.2.0 heavy writing at raid5 array kills processes
I have also noticed this type of problem. It seems as though the RAID5 driver generates a growing write backlog and keeps allocating new buffers when new asynchronous write requests get in. Eventually it reserves all the available physical memory. Trying to swap data to virtual memory storage would only make the situation worse. I'm not sure where the responsibility lies for this problem. The md driver can limit how much it allocates, but the memory manager should be able to handle this situation better. Markus Linnala wrote: > > v2.2.0 heavy writing at raid5 array kills processes randomly, including init. > > Normal user can force random processes to out of memory > situation when writing stuff at raid5 array. This makes the raid > > I get 'Out of memory for init. ' etc. with following simple command: > > dd if=/dev/zero of=file > > Repeatable, file is between 100-200M after dd gets killed. > I guess this killing action seems to be triggered by swapping. >
Where is 2.1.131-ac11 kernel
I've been hearing about 2.1.131-ac9, and now 2.1.131-ac11. What does the -acX mean and where is it available? Thanks, <>< Lance.
Re: raid0145 & devfs v79
Eric van Dijken wrote: > Is there somebody working on joining the devfs patch and the raid patch in > the linux kernel (2.1.130) ? > I am planning on working on this issue sometime this week. <>< Lance.
Raid5 pauses when doing mk2efs on PowerPC
Hi all, The RAID5 md driver pauses for 10-11 seconds, many times, while doing a mke2fs. The pauses start after 300-400 groups have been written, then a small amount of transfers happen between pauses until the process is done. The spurts of transfers between pauses range between .01 seconds to maybe 3 seconds. Under 'normal' use afterwards, everything seems fine. The problem is memory resource sensitive. If there is more RAM, the pauses start later while doing the mkfs. The problem possibly is a race condition with the raid daemon and the code that re-starts it. My environment is... PowerPC G3 266MHZ, 64MB ram. Symbios (LSI) 53c895 PCI SCSI chip 3, 4GB LVD drives (1 Quantum Viking II, 2 Seagate Barracudas) Kernel 2.1.127 & related raid patches. raid5 cluster size is 32k. Since most folks are using x86 systems and I haven't heard of this problem on the raid list, it seems to be specific to the powerpc. Any thoughts?