RE: Swap on RAID
Well, the reason we have our systems set to swap on RAID (we use RAID-1) is that this improves our robustness. Even if one of our disks dies then the swap continues to work and the system is still stable. Also, I believe, it is possible to use a RAID-10 to stripe and mirror and actually improve swap performance. With the new SCSI controllers you could probably approach 160MB/s swap speed. Not bad and a heck-of-a-lot better than a single disk at ~20MB/s. I've never tested the performance of this, like I said, we use RAID for increased stability. --Rainer -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Henry J. Cobb Sent: Friday, June 02, 2000 9:00 AM To: [EMAIL PROTECTED] Subject: Swap on RAID Does anybody really want to wait while their swap data is duplicated out to multiple disks by a CPU that is working to free up memory to run applications? Isn't Swapping slow enough already? Why not simply swap on multiple disks, get Hardware RAID-5 for swap or buy RAM?
Copying partition information
Hi all, Is there an easy way to copy the partition information from one disk to another disk that is exactly the same size? I'm guessing a dd on the right device might do this but the exact command would be appreciated. My goal is to not have to manually fdisk multiple disks before syncing them into an existing RAID-1 array. --Rainer
RE: Raid5 with two failed disks?
Hi all, I think my situation is the same as this "two failed disks" one but I haven't been following the thread carefully and I just want to double check. I have a mirrored RAID-1 setup between 2 disks with no spare disks. Inadvertantly the machine got powered down without a proper shutdown apparently causing the RAID to become unhappy. It would boot to the point where it needed to mount root and then would fail saying that it couldn't access /dev/md1 because the two RAID disks were out of sync. Anyway, given this situation, how can I rebuild my array? Is all it takes is doing another mkraid (given the raidtab is identical to the real setup, etc)? If so, since I'm also booting off of raid, how do I do this for the boot partition? I can boot up using one of the individual disks (e.g. /dev/sda1) instead of the raid disk (/dev/md1), but if I do that will I be able to do a mkraid on an in-use partition? If not, how do I resolve this (boot from floppy?). Finally, is there any way to automate this recovery process. That is, if the machine is improperly powered down again, can I have it automatically rebuild itself the next time it comes up? Thanks in advance, --Rainer
RE: Raid5 with two failed disks?
Hmm, well, I'm certainly not positive why it wouldn't boot and I don't have the logs in front of me, but I do remember it saying that it couldn't mount /dev/md1 and therefore had a panic during boot. My solution was to specify the root device as /dev/sda1 instead of the configured /dev/md1 from the lilo prompt. The disk is marked to auto raid start and marked as fd. And, it booted just fine until the "dumb" shutdown. As for a rescue disk I'll put one together. Thanks for the advice. --Rainer -Original Message- From: Michael Robinton [mailto:[EMAIL PROTECTED]] Sent: Monday, April 03, 2000 8:50 AM To: Rainer Mager Cc: Jakob Ostergaard; [EMAIL PROTECTED] Subject: RE: Raid5 with two failed disks? Whether or not the array is in sync should not make a difference to the boot process. I have both raid1 and raid 5 systems that run root raid and will boot quite nicely and rsync automatically after a "dumb" shutdown that leaves them out of sync. Do you have your kernel built for auto raid start?? and partitions marked "fd" ? You can reconstruct you existing array by booting with a kernel that supports raid and with the raid tools on the rescue system. Do it all the time. Michael
Adding RAID-1 to an existing partition.
Hi all, I'm trying to add mirroring RAID to an existing partition. Is this possible? It appears to me that when I do mkraid I'm going to kill the existing data. Is there no way to tell it to use my exisiting data in a new RAID seting and sync it to the new partition(s)? Thank, --Rainer
Re: is Raid 0.9 and SMP safe
Speaking of IDE, can anyone answer me this. How come, when I use my IDE drive heavily (like cat /dev/zero somefile) my system virtually freezes up. This includes mouse, keyboard, everything. Eventually it comes back when the disk IO is done but in the interim nothing moves. And, yes, I do have DMA set to on via hdparm. Thanks, --Rainer P.S. I too have a SMP machine. It is wonderful when using VMware.
RE: Argh, more problems with SCSI and RAID
Hi all, I'm back and looking for more advice. Ok, so I got my SCSI stuff running, seemingly smoothly, and decided to try out RAID again. I started up md0 with 4 drives and 1 spare (actually the spare was just another partition on one of the base 4 drives but what the heck). I then make my ext2 and copied in about 2 Gig of stuff and everything was happy and spinning. Then, I tried to make a backup to my SCSI tape drive and this also worked fine. Finally I decided to verify the backup and this died in the middle with oodles of SCSI errors. Now, instead of trying to figure out specifically what went wrong, I'd like to ask the SCSI gurus, what is the best way to diagnose SCSI problems. How do yo know if your cables are too long? Is it a bad idea to set you SCSI controller to a fast sync speed (e.g., 40 MB/s) when a drive can only handle 5? Under what conditions should I set my controller to asynch for a particular drive instead of sync? And finally, what is this SCSI stuff so darn mysterious and difficult? I have a 20 Gig IDE drive that is giving my about 18MB/s and my fasted SCSI disk only gives me about 9MB/s. What am I paying so my for SCSI? Sorry for the slight ranting above. Answers and advice would be appreciated. Now, onto RAID. So, I rebooted, set one drive to async (the drive that seemed to have might have been, maybe, having a few problems, maybe). Now, so far, the RAID drive seems fine. I created a 2 Gig file and it was happy with that. I read the file back. Still no problems. Then I did a hdparm speed test on /dev/md0. Hmm, only 7 MB/s. Why would that be since my fast single SCSI drives can pump out 9 MB/s. I thought this SCSI 5 stuff was supposed to INCREASE performance. Well, at least I enjoy this hacking stuff. Anyone ideas? --Rainer
Argh, more problems with SCSI and RAID
Hi all, I'm trying to do 2 things with SCSI/RAID both of which are having problems. Any help would be greatly appreciated. My system is 2.2.14 with the 5.1.22 Adaptec AIC-7xxx drivers. First, I'm trying to get 4 SCSI drives working. Forget RAID, forget anything complex, I just want them working. I actually had 5 until yesterday when one decided to start making loud noises which I intrepreted as death throws. Anyway, the problem is that after I start up my 4 drives, at some point, I get this: Feb 1 02:48:01 dual kernel: scsi : aborting command due to timeout : pid 10920, scsi1, channel 0, id 0, lun 0 Read (6) 00 06 df 60 00 Feb 1 02:48:03 dual kernel: scsi : aborting command due to timeout : pid 10921, scsi1, channel 0, id 1, lun 0 Read (6) 00 07 c7 08 00 Feb 1 02:48:03 dual kernel: scsi : aborting command due to timeout : pid 10922, scsi1, channel 0, id 1, lun 0 Read (6) 00 07 d7 48 00 Feb 1 02:48:31 dual kernel: scsi : aborting command due to timeout : pid 10919, scsi1, channel 0, id 0, lun 0 Read (6) 00 06 bf 18 00 Feb 1 02:48:31 dual kernel: scsi : aborting command due to timeout : pid 10920, scsi1, channel 0, id 0, lun 0 Read (6) 00 06 df 60 00 These messages went on for about 5 hours before I finally rebooted the machine (nothing else worked). Note that my file system stuff is IDE right now. Nothing is on the 4 SCSI disks yet. Given this, I question why my whole system became inopperative just because SCSI was down. Note that I could still do things that didn't use disk access. It was only when I tried to access the disk (IDE or SCSI) that things froze up. So, can anyone hypothesize why the above happened and what I can do to fix it? Before it happened I did a few simple tests on the drives, like: cat /dev/zero /mnt/t1/test This worked fine and created files of about 250MB before I broke the cat with CTRL-C. The second problems, possible related to the first is this. I did try a few experiments with SCSI. I have significantly different sized drives and was hoping to maximize their usage by using a RAID-0 md device (md0) as part of a RAID-5 array. When I tried the mkraid on this the command started but immediately froze up the SCSI system. Should this work? Is there any other way to achieve disk maximization along with redundancy? On a different note. I had a success with SCSI/Linux last night as well. When I got home one drive was making noises as mentioned above. I un-mounted as SCSI drives. Removed all SCSI related modules. Turned off the unhappy drive (in an external disk case). Removed it. Plugged all the other SCSI stuff back together. Turned it back on. Recompiled the AIC driver with 5.1.22. Reloaded the SCSI modules. And everything came back up, probed the buses, and worked. All of this without turning off my machine. Lovely! Now if I can just make it stable. Thanks, --Rainer
Re: Argh, more problems with SCSI and RAID
parity Feb 1 07:48:16 dual kernel: RAID5 conf printout: Feb 1 07:48:16 dual kernel: --- rd:3 wd:3 fd:0 Feb 1 07:48:16 dual kernel: disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sda3 Feb 1 07:48:16 dual kernel: disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdc3 Feb 1 07:48:16 dual kernel: disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sdc2 Feb 1 07:48:16 dual kernel: disk 3, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00] Feb 1 07:48:16 dual kernel: disk 4, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00] Feb 1 07:48:16 dual kernel: disk 5, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00] Feb 1 07:48:16 dual kernel: disk 6, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00] Feb 1 07:48:16 dual smb: smbd startup succeeded Feb 1 07:48:16 dual kernel: disk 7, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00] Feb 1 07:48:16 dual kernel: disk 8, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00] Feb 1 07:48:16 dual kernel: disk 9, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00] Feb 1 07:48:16 dual kernel: disk 10, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00] Feb 1 07:48:16 dual kernel: disk 11, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00] Feb 1 07:48:16 dual kernel: RAID5 conf printout: Feb 1 07:48:16 dual kernel: --- rd:3 wd:3 fd:0 Feb 1 07:48:16 dual kernel: disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sda3 Feb 1 07:48:16 dual kernel: disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdc3 Feb 1 07:48:16 dual kernel: disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sdc2 Feb 1 07:48:16 dual kernel: disk 3, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00] Feb 1 07:48:16 dual kernel: disk 4, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00] Feb 1 07:48:16 dual kernel: disk 5, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00] Feb 1 07:48:16 dual kernel: disk 6, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00] Feb 1 07:48:16 dual kernel: disk 7, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00] Feb 1 07:48:16 dual kernel: disk 8, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00] Feb 1 07:48:16 dual kernel: disk 9, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00] Feb 1 07:48:16 dual kernel: disk 10, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00] Feb 1 07:48:16 dual kernel: disk 11, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00] Feb 1 07:48:16 dual kernel: md: updating md1 RAID superblock on device Feb 1 07:48:16 dual kernel: sdc2 [events: 0007](write) sdc2's sb offset: 3004032 Feb 1 07:48:16 dual kernel: md: syncing RAID array md1 Feb 1 07:48:16 dual kernel: md: minimum _guaranteed_ reconstruction speed: 100 KB/sec. Feb 1 07:48:16 dual kernel: md: using maximum available idle IO bandwith for reconstruction. Feb 1 07:48:16 dual kernel: md: using 1024k window. Feb 1 07:48:16 dual kernel: sdc3 [events: 0007](write) sdc3's sb offset: 3004032 Feb 1 07:48:16 dual kernel: sda3 [events: 0007](write) sda3's sb offset: 3004032 Feb 1 07:48:16 dual kernel: . Feb 1 07:48:16 dual kernel: ... autorun DONE. Also from previously: Feb 1 00:36:30 dual kernel: running: sdb2sdc2sdc3sda3 Feb 1 00:36:30 dual kernel: now! Feb 1 00:36:30 dual kernel: sdb2's event counter: 0001 Feb 1 00:36:30 dual kernel: sdc2's event counter: 0005 Feb 1 00:36:30 dual kernel: sdc3's event counter: 0005 Feb 1 00:36:30 dual kernel: sda3's event counter: 0005 Feb 1 00:36:30 dual kernel: md: superblock update time inconsistency -- using the most recent one Feb 1 00:36:30 dual kernel: freshest: sdc2 Feb 1 00:36:30 dual kernel: md: kicking non-fresh sdb2 from array! Feb 1 00:36:30 dual kernel: unbindsdb2,3 Feb 1 00:36:30 dual kernel: export_rdev(sdb2) Feb 1 00:36:30 dual kernel: md1: removing former faulty sdb2! Feb 1 00:36:30 dual kernel: raid5 personality registered What might have caused sdb2 to get kicked? Also a question into the mysteries of SCSI termination. I mentioned that 1 drive died and I removed it. When I removed it from its external case I left the other drive in the case (it is a case for 2 drives). This means that the ribbon cable in that case now only has one thing connected to it, not 2. Should this make a difference? My understanding is no. Also, regarding figuring out the general SCSI stuff before playing with RAID. I heartily agree. Ugh, now my head hurts. --Rainer - Original Message - From: James Manning [EMAIL PROTECTED] To: Rainer Mager [EMAIL PROTECTED] Cc: Linux-RAID [EMAIL PROTECTED] Sent: Tuesday, February 01, 2000 12:00 PM Subject: Re: Argh, more problems with SCSI and RAID | Tell us how the 4 (5?) drives are scattered among the 2 controllers (2 | channels of one controller?), which id's, etc. The above looks like my | 3960D problem that went away when I added "noapic" on my BP6 board. | | Perhaps it's a termination issue since it looks limited to 2 drives on | one controller? Just a guess. | | In any case, I wouldn't bother with the md layer until you can get all | the scsi drives acting happy for extended periods of time :) | | James | -- | Miscellaneous Engineer --- IBM Netfinity Performance Development |
RE: Argh, more problems with SCSI and RAID
Multiple responses to responses in 1 message, I hope no one gets confused... First off, I did some testing with normal SCSI (no RAID) (yes, I do know this is a RAID mailing list, but everyone is being so helpful I hope I'm forgiven ;-). I wrote simultaneously to each of my 4 SCSI drives as fast as I could (via cat /dev/zero drive/file). Doing this produced no errors and I stopped it after about 15 minutes (approx 500MB per disk). I did receive this during the operation, though: Feb 1 22:48:13 dual kernel: (scsi1:0:0:0) Performing Domain validation. Feb 1 22:48:13 dual kernel: (scsi1:0:0:0) Successfully completed Domain validation. Feb 1 22:48:18 dual named[968]: Cleaned cache of 0 RRs Feb 1 22:48:18 dual named[968]: USAGE 949412898 949358898 CPU=0.12u/0.1s CHILDCPU=0u/0s Feb 1 22:48:18 dual named[968]: NSTATS 949412898 949358898 A=7 MX=2 ANY=3 Feb 1 22:48:18 dual named[968]: XSTATS 949412898 949358898 RR=18 RNXD=1 RFwdR=13 RDupR=0 RFail=0 RFErr=0 RErr=0 RAXFR=0 RLame=0 ROpts=0 SSysQ=4 SAns=3 SFwdQ=6 SDupQ=13 SErr=0 RQ=12 RIQ=0 RFwdQ=0 RDupQ=4 RTCP=0 SFwdR=13 SFail=0 SFErr=0 SNaAns=3 SNXD=0 Is this normal? Should it just periodically appear? Before when I got all my timeouts is when I was doing 2 things I haven't tried again yet. One, I had all my SCSI drives being used a swap. Two, I was trying to back up to my SCSI tape drive. I'll try these again later when I'm ready for another crash ;-(As it is right now, everything seems stable, albeit a bit slow. Before I forget, I just wanted to say thanks to everyone's help. I really appreciate it. Any further ideas would be gratefully accepted. Now follows some responses to responses -Original Message- From: Stephen Waters [mailto:[EMAIL PROTECTED]] or you could just configure the transfer rate to be one notch lower than your current level. had to do that with my 4 U2W drives in a hotswap box w/ a tekram dc390u2b (symbios chipset). True but I did some benchmarking today using both "hdparm -tT" and another simple test and I'm finding terrible throughput. My one drive that is on a dedicated bus and should be able to do up to 40MB/s is only getting about 9 MB/s. An older SCSI drive (on a different bus) that should be able to do 5MB is only getting 2MB. This is making me think something else is wrong. I may try your suggestion anyway and see if reducing the speed helps at all. Interestingly, the Linux says the 40MB/s drive is running at 32MB/s. Either way it is far more than the measured 9 MB/s. From: Peter Pregler [mailto:[EMAIL PROTECTED]] I had similar problems (actually your messages could be a cut-and-paste of my old logs) with my box at the beginning. The actual problems was that the scsi-bus did not fullfill the specifications. Replacing some hardware (hot-swap boxes) solved it. BTW, all worked well under DOS in the test-environment shipped. But as soon as linux got on the box and did _really_ use the bandwidth on the bus the troubles showed up (timeouts, renegotiation, slowdown ...). Hmm, good info here. But I don't know how it helps. I know and acknowledge that some of my drives are old/slow but shouldn't they still work without errors? When you say you replaced "hot-swap boxes" you are talking about drives, not SCSI cards, right? From: Mike Black [mailto:[EMAIL PROTECTED]] Try turning off SYNC mode on ALL your drives in the SCSI BIOS. I had a similar problem this last weekend with 2.2.14 and 5.1.21 AIC-7xxx and async mode fixed it. I had been previously running SYNC mode with no problems, but I added two new drives and couldn't get the mkraid to finish without hitting the same errors you're seeing. All my drives are now running async and are happy. Slow, but happy. Previously (on 2.2.14 and prior) they were running happy in sync mode but I was upgrading this weekend and had to do a mkraid again (which really bangs the SCSI bus during resync). I tried several times with different sync/async combos with no joy. I thought this was just my config on the one machine but, this morning I had a problem on another box which is a 3x50G 2940U2W setup. I accidentally left it NFS mounted and during the attempted backup it also hit some SCSI timeouts -- it recovered though. This box has been flawless for a long time (I think it must've been the network I/O causing this -- no proof yet -- just suspicion). Oh-ho. That's an interesting discovery. I'm, of course, hesitant to redurce my throughput, especially since it is so low to start out with (see above) but that is better than timeouts, etc. I'll give it a shot. --Rainer
Re: crash/instability
I have a Supermicro MB with a built-in Adaptec AIC-7895 (Dual UWSCSI 2) which seems to have some stability problems...sometimes. The machine is also SMP which I believe contributes. What happens is that when using some of my SCSI devices the machine will sometimes lock up completely with no messages anywhere. Sometimes I do get some syslog messages prior to the lock up regarding SCSI resets and stuff but they do not always happen. Sometimes I can go for quite a long time without any problems. Making things even more interesting is that these problems do not exist in 2.2.9 but do seem to in 2.2.x above that. I'm now running on 2.2.14 but I've switched my primary drive to an IDE one so I haven't put my SCSI through its paces recently so I don't know if the problem still exists although I think it does since I did have 1 lockup when I was using my SCSI drives as swap. Anyway, has anyone seen anything like this or have any suggestions on debugging? I know this isn't really RAID related but it seems you RAID people are more likely to be using SCSI and possibly SMP machines so I'm asking here. Any tips would be greatly appreciated. Thanks, --Rainer
Mixed disks
Hi all, I have a motley set of disks that I'm hoping to use more effeciently via raid. Any tips on this would be greatly appreciated. Right now I have: 2 X 2 Gig SCSI 1 X 3 Gig SCSI 1 X 7 Gig SCSI 1 X 4.5 Gig UW SCSI 2 1 X 20 Gig IDE I don't know the exact type of SCSI for the for 4 drive but they are all old and slowish. The 4.5 Gig one is nice and fast and is on a dedicated 40 MB SCSI bus right now. Also I have 1 128MB partitions taken out of each drive that I use for swap, yes, I know that gives me 800 MB swap (yummy). I know I could combine all/any of the drive using Linear-RAID but I really want some redundancy. I believe that probably puts me in the RAID-5 area. The problem is all of the disks are different sizes. I was thinking of trying to create something like a 3 slice size for the RAID array. I would take one out of the 3 Gig, the 7 Gig, the 4.5 Gig, and 1 or more out of the 20 Gig. I would also combine the 2 X 2 Gigs into a 4 Gig (using RAID-0) and then take a 3 Gig out of that. Then I would have at least 5 3 Gig pieces that I would combine with RAID-5. I would probably use the other half of the 7 Gig for a backup to that array and perhaps the same via another slize of the 20 Gig. Any comments? Can I use a RAID-0 disk in a RAID-5 array? Is this a good or bad idea? Thanks, --Rainer - Original Message - From: Raid [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Tuesday, January 18, 2000 10:16 AM Subject: Re: AW: AW: IDE RAID controller? | There is a beta driver out now but Promise are not supporting it at all. | | I have found an independant kernel patch that apparently works OK, but you | cannot boot off the array. You must use a stand-alone HD for the kernel and | then you can use it to mirror other drives. Apparently - I haven't tried | it, but I have spoken to others that have managed to get it working. I | spent many hours playing around with it trying to get it to boot off the | array but I was unsuccessful. Promise DID say that they thought that the | Promise driver would not support booting from the array either. | | SCSI RAID looks more and more interesting - hang the expense! | | Regards, | Brad | | | At 09:06 14/01/00 +0100, you wrote: | Hi Brad, | | Oops, no Linux for this? I can't belive this, many of listmembers mentioned | this Controller. But I don't realy know, I just assumed that there is. | | I was about to take one for myself. | | By | Barney | | -Ursprüngliche Nachricht- | Von: Raid [mailto:[EMAIL PROTECTED]] | Gesendet: Freitag, 14. Januar 2000 02:37 | An: Schackel, Fa. Integrata, ZRZ DA | Betreff: Re: AW: IDE RAID controller? | | | Hi Barney. | | AFAIK there is no Linux support for this controller. Do you | know anyone who | has managed to get it going under Linux? | | Regards, | Brad | | | At 09:55 5/01/00 +0100, you wrote: | Hi, | | how about Promise Raid 0,1 Controller. | Have a look @ http://www.promise.com/Products/products.htm#ideraid | | By, Barney | | -Ursprüngliche Nachricht- | Von: Raid [mailto:[EMAIL PROTECTED]] | Gesendet: Mittwoch, 5. Januar 2000 04:03 | An: [EMAIL PROTECTED] | Betreff: IDE RAID controller? | | | Does anyone know of an ATA-66 IDE RAID controller for Linux? | I have seen | the Arco product at http://www.arcoide.com/dupli-pci.htm but | it is only | UDMA/33. | | Brad | | | | | |