Re: RAID, persistent superblock on SPARC
On Sun, Jul 09, 2000 at 11:04:20PM -0700, Gregory Leblanc wrote: What's the current status of RAID on SPARC? I haven't had a chance to keep up very much, as I wasn't using RAID on SPARCs. I'm about to build a mirrored system here, and I'd like to make sure that I'm not going to get hosed because of some bug. Thanks, I think the quick fix (copying 4096 bytes, not sizeof(md_superblock*)) was added, at least to 2.2.16 patches, will have to check 2.4.x (the copy of 2.3.99-pre7 I have on my box does not include it). The larger patch I sent which fixed the on-disk format and allowed RAID arrays to be stored at cylinder 0 of SPARC labeled disks did not make it in into either version (will have to ask Ingo about it). Jakub
Re: RAID possible with disks of varying sizes?
software raid will NOT save you from power failure. it will save you from disk/controller/cable failure only! do NOT lull yourself into a false sense of security. if you have a people who cant handle unix and powering down, then you need an UPS and lock your box in a closet. linux software raid uses partitions, not disks as its slices, so it does not matter if your disks are not all the same size, if you make the partitions the same size. what you do depends on your application, but i would definately move one of those ide disks to its own bus, even if i was not going to raid. probably the 4 gig scsi is a good investment, since the different controller/cabling from the ide disks should provide better redundancy. allan Micah Anderson [EMAIL PROTECTED] said: I have got a machine that nearly coughed up blood yesterday because someone pulled the power on it. The fscks were nasty, let me tell you that I am happy for backup superblocks. Anyways, that was too close, I need a RAID solution in this weekend, or I am going to panic. The problem is the hardware I have available is haphazard PC hardware. You know what I mean. Currently being used on the machine I've got: hda: WDC AC22100H, 2014MB hdb: WDC AC31600H, 1549MB Available to kludge together a RAID system are various pieces of useful hardware. Three 1 gig and a 4 gig scsi drives are available. I've got a bunch of 540s, which considering their size are probably old and not worth the hassle. I can pick up something else if it comes down to that, but it would be great if I could use what I have. Can I take these random drives and make them into something that I can rely on? Do I need to have drives that are exactly the same size as the drives that I want to mirror, or can I take a couple here, a couple there and make them useful? Thanks!
lilo error for degraded root device
I have kernel-2.2.16-3.i386.rpm. I have lilo 21.4-3 I have /etc/raidtab = raiddev /dev/md0 raid-level 1 nr-raid-disks 2 persistent-superblock 1 chunk-size 4 device /dev/sda1 failed-disk 0 device /dev/sdb1 raid-disk 1 raiddev /dev/md1 raid-level 1 nr-raid-disks 2 persistent-superblock 1 chunk-size 4 device /dev/sda5 raid-disk 0 device /dev/sdb5 raid-disk 1 raiddev /dev/md2 raid-level 1 nr-raid-disks 2 persistent-superblock 1 chunk-size 4 device /dev/sda6 raid-disk 0 device /dev/sdb6 raid-disk 1 raiddev /dev/md3 raid-level 1 nr-raid-disks 2 persistent-superblock 1 chunk-size 4 device /dev/sda7 raid-disk 0 device /dev/sdb7 raid-disk 1 raiddev /dev/md4 raid-level 1 nr-raid-disks 2 persistent-superblock 1 chunk-size 4 device /dev/sda8 raid-disk 0 device /dev/sdb8 raid-disk 1 raiddev /dev/md5 raid-level 1 nr-raid-disks 2 persistent-superblock 1 chunk-size 4 device /dev/sda9 raid-disk 0 device /dev/sdb9 raid-disk 1 I have the following mounts = Filesystem 1k-blocksUsed Available Use% Mounted on /dev/sda1 127902 40987 80311 34% / /dev/md110079980 37028 9530908 0% /home /dev/md2 4032448 416788 3410816 11% /usr /dev/md4 254699 14241533 0% /tmp /dev/md5 2546996766234781 3% /var /dev/md3 20166561992 1912220 0% /usr/local /dev/md0 127790 40988 80204 34% /stage /proc/mdstat = Personalities : [raid1] read_ahead 1024 sectors md0 : active raid1 sdb1[1] 131968 blocks [2/1] [_U] md3 : active raid1 sdb7[1] sda7[0] 2048896 blocks [2/2] [UU] md1 : active raid1 sdb5[1] sda5[0] 10240896 blocks [2/2] [UU] md2 : active raid1 sdb6[1] sda6[0] 4096896 blocks [2/2] [UU] md4 : active raid1 sdb8[1] sda8[0] 263040 blocks [2/2] [UU] md5 : active raid1 sdb9[1] sda9[0] 263040 blocks [2/2] [UU] unused devices: none /stage/etc/lilo.conf = boot = /dev/md0 delay = 5 vga = normal root = /dev/md0 image = /boot/bzImage label = linux when I run lilo I get the following: [root@otherweb /root]# lilo -r /stage boot = /dev/sdb, map = /boot/map.0811 Added bzImage * Syntax error near line 2 in file /etc/lilo.conf raid1 is in the kernel and /stage is a copy of / fs. Even with a /stage/etc/lilo.conf = boot=/dev/md0 image=/boot/bzImage label=linux root=/dev/md0 I get the same error message. The idea is to boot the new md0 root partition in degraded mode and then raidhotadd the sda1 partition which is currently root to the md0 device. Anyone have any idea what I'm doing wrong? Is there a better way? Hugh.
Re: speed and scaling
arguably only 500gb per machine will be needed. I'd like to get the fastest possible access rates from a single machine to the data. Ideally 90MB/s+ Is this vastly read-only or will write speed also be a factor? -HJC
Re: lilo error for degraded root device
/stage/etc/lilo.conf = boot = /dev/md0 error is here. The boot device must be a real disk see: ftp://ftp.bizsystems.net/pub/raid/Boot+Root+Raid+LILO.html for examples delay = 5 vga = normal root = /dev/md0 image = /boot/bzImage label = linux when I run lilo I get the following: [root@otherweb /root]# lilo -r /stage boot = /dev/sdb, map = /boot/map.0811 Added bzImage * Syntax error near line 2 in file /etc/lilo.conf [EMAIL PROTECTED]
Re: speed and scaling
Seth Vidal wrote: [monster data set description snipped] So were considering the following: Dual Processor P3 something. ~1gb ram. multiple 75gb ultra 160 drives - probably ibm's 10krpm drives Adaptec's best 160 controller that is supported by linux. [snip] So my questions are these: Is 90MB/s a reasonable speed to be able to achieve in a raid0 array across say 5-8 drives? While you might get this from your controller data bus, I'm skeptical of moving this much data consistently across the PCI bus. I think it has a maximum of 133 MB/sec bandwidth (33 MHz * 32 bits width). Especially if (below) you have some network access going on, at near gigabit speeds.. you're just pushing lots of data. What controllers/drives should I be looking at? See if there is some sort of system you can get with multiple PCI busses, bridged or whatnot. And has anyone worked with gigabit connections to an array of this size for nfs access? What sort of speeds can I optimally (figuring nfsv3 in async mode from the 2.2 patches or 2.4 kernels) expect to achieve for network access. I've found vanilla nfs performance to be crummy, but haven't played with it at all. Ed -- Edward Schernau,mailto:[EMAIL PROTECTED] Network Architect http://www.schernau.com RC5-64#: 243249 e-gold acct #:131897
RE: speed and scaling
-Original Message- From: Seth Vidal [mailto:[EMAIL PROTECTED]] Sent: Monday, July 10, 2000 12:23 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: speed and scaling So were considering the following: Dual Processor P3 something. ~1gb ram. multiple 75gb ultra 160 drives - probably ibm's 10krpm drives Adaptec's best 160 controller that is supported by linux. The data does not have to be redundant or stable - since it can be restored from tape at almost any time. so I'd like to put this in a software raid 0 array for the speed. So my questions are these: Is 90MB/s a reasonable speed to be able to achieve in a raid0 array across say 5-8 drives? Assuming sequential reads, you should be able to get this from good drives. What controllers/drives should I be looking at? I'm not familiar with current top-end drives, but you should be looking for at least 4MB of cache on the drives. I think the best drives that you'll find will be able to deliver 20MB/sec without trouble, possibly a bit more. I seem to remember somebody on this liking Adaptec cards, but nobody on the SPARC lists will touch the things. I might look at a Tekram, or a Symbios based card, I've heard good things about them, and they're used on some of the bigger machines that I've worked with. Later, Grego
Re: speed and scaling
If you can afford it and this is for real work, you may want to consider something like a Network Appliance Filer. It will be a lot more robust and quite a bit faster than rolling your own array. The downside is they are quite expensive. I believe the folks at Raidzone make a "poor man's" canned array that can stuff almost a terabyte in one box and uses cheaper IDE disks. If you can't afford either of these solutions, 73gig Seagate Cheetahs are becoming affordable. Packing one of those rackmount 8 bay enclosures with these gets you over 500gb of storage if you just want to stripe them together. That would likely be VERY fast for reads/writes. The risk is that you'd lose everything if one of the disks crashed. Cheers, Chris From [EMAIL PROTECTED] Mon Jul 10 16:46:37 2000 Sounds like fun. Check out VA Linux's dual CPU boxes. They also offer fast LVD SCSI drives which can be raided together. I've got one dual P3-700 w/ dual 10k LVD drives. FAST! I'd suggest staying away from NFS for performance reasons. I think there is a better replacement out there ('coda' or something?). NFS will work, but I don't think it's what you want. You could also try connecting the machines through SCSI if you want to share files quickly (I haven't done this, but have heard of it). Good luck! Phil On Mon, Jul 10, 2000 at 03:22:46PM -0400, Seth Vidal wrote: Hi folks, I have an odd question. Where I work we will, in the next year, be in a position to have to process about a terabyte or more of data. The data is probably going to be shipped on tapes to us but then it needs to be read from disks and analyzed. The process is segmentable so its reasonable to be able to break it down into 2-4 sections for processing so arguably only 500gb per machine will be needed. I'd like to get the fastest possible access rates from a single machine to the data. Ideally 90MB/s+ [...] -- Philip Edelbrock -- IS Manager -- Edge Design, Corvallis, OR [EMAIL PROTECTED] -- http://www.netroedge.com/~phil PGP F16: 01 D2 FD 01 B5 46 F4 F0 3A 8B 9D 7E 14 7F FB 7A -- Christopher Mauritz [EMAIL PROTECTED]
Re: speed and scaling
arguably only 500gb per machine will be needed. I'd like to get the fastest possible access rates from a single machine to the data. Ideally 90MB/s+ Is this vastly read-only or will write speed also be a factor? mostly read-only. -sv
RE: speed and scaling
i have not used adaptec 160 cards, but i have found most everything else they make to be very finicky about cabling and termination, and have had hard drives give trouble on adaptec that worked fine on other cards. my money stays with a lsi/symbios/ncr based card. tekram is a good vendor, and symbios themselves have a nice 64 bit wide, dual channel pci scsi card. which does lead to the point about pci. even _IF_ you could get the entire pci bus to do your disk transfers, you will find that you would still need more bandwidth for stuff like using your nics. so, i suggest you investigate a motherboard with either 66mhz pci or 64 bit pci, or both. perhaps alpha? allan Gregory Leblanc [EMAIL PROTECTED] said: -Original Message- From: Seth Vidal [mailto:[EMAIL PROTECTED]] Sent: Monday, July 10, 2000 12:23 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: speed and scaling So were considering the following: Dual Processor P3 something. ~1gb ram. multiple 75gb ultra 160 drives - probably ibm's 10krpm drives Adaptec's best 160 controller that is supported by linux. The data does not have to be redundant or stable - since it can be restored from tape at almost any time. so I'd like to put this in a software raid 0 array for the speed. So my questions are these: Is 90MB/s a reasonable speed to be able to achieve in a raid0 array across say 5-8 drives? Assuming sequential reads, you should be able to get this from good drives. What controllers/drives should I be looking at? I'm not familiar with current top-end drives, but you should be looking for at least 4MB of cache on the drives. I think the best drives that you'll find will be able to deliver 20MB/sec without trouble, possibly a bit more. I seem to remember somebody on this liking Adaptec cards, but nobody on the SPARC lists will touch the things. I might look at a Tekram, or a Symbios based card, I've heard good things about them, and they're used on some of the bigger machines that I've worked with. Later, Grego
Re: speed and scaling
If you can afford it and this is for real work, you may want to consider something like a Network Appliance Filer. It will be a lot more robust and quite a bit faster than rolling your own array. The downside is they are quite expensive. I believe the folks at Raidzone make a "poor man's" canned array that can stuff almost a terabyte in one box and uses cheaper IDE disks. I priced the netapps - they are ridiculously expensive. They estimated 1tb at about $60-100K - thats the size of our budget and we have other things to get. What I was thinking was a good machine with a 64bit pci bus and/or multiple buses. And A LOT of external enclosures. If you can't afford either of these solutions, 73gig Seagate Cheetahs are becoming affordable. Packing one of those rackmount 8 bay enclosures with these gets you over 500gb of storage if you just want to stripe them together. That would likely be VERY fast for reads/writes. The risk is that you'd lose everything if one of the disks crashed. this isn't much of a concern. The plan so far was this (and this plan is dependent on what advice I get from here) Raid0 for the read-only data (as its all on tape anyway) Raid5 or Raid1 for the writable data on a second scsi controller. Does this sound reasonable? I've had some uncomfortable experiences with hw raid controllers - ie: VERY poor performance and exbortitant prices. My SW raid experiences under linux have been very good - excellent performance and easy setup and maintenance. (well virtually no maintenance :) -sv
Re: speed and scaling
You will definitely need that 64 bit PCI bus. You might want to watch out for your memory bandwidth as well. (i.e. get something with interleaved memory). Standard PC doesn't get but 800MB/s peak to main memory. FWIW, you are going to have trouble pushing anywhere near 90MB/s out of a gigabit ethernet card, at least under 2.2. I don't have any experience w/ 2.4 yet. On Mon, 10 Jul 2000, Seth Vidal wrote: If you can afford it and this is for real work, you may want to consider something like a Network Appliance Filer. It will be a lot more robust and quite a bit faster than rolling your own array. The downside is they are quite expensive. I believe the folks at Raidzone make a "poor man's" canned array that can stuff almost a terabyte in one box and uses cheaper IDE disks. I priced the netapps - they are ridiculously expensive. They estimated 1tb at about $60-100K - thats the size of our budget and we have other things to get. What I was thinking was a good machine with a 64bit pci bus and/or multiple buses. And A LOT of external enclosures. If you can't afford either of these solutions, 73gig Seagate Cheetahs are becoming affordable. Packing one of those rackmount 8 bay enclosures with these gets you over 500gb of storage if you just want to stripe them together. That would likely be VERY fast for reads/writes. The risk is that you'd lose everything if one of the disks crashed. this isn't much of a concern. The plan so far was this (and this plan is dependent on what advice I get from here) Raid0 for the read-only data (as its all on tape anyway) Raid5 or Raid1 for the writable data on a second scsi controller. Does this sound reasonable? I've had some uncomfortable experiences with hw raid controllers - ie: VERY poor performance and exbortitant prices. My SW raid experiences under linux have been very good - excellent performance and easy setup and maintenance. (well virtually no maintenance :) -sv --- Keith Underwood Parallel Architecture Research Lab (PARL) [EMAIL PROTECTED] Clemson University
Re: speed and scaling
I haven't had very good experiences with the Adaptec cards either. If you can take the performance hit, the Mylex ExtremeRAID cards come in a 3-channel variety. You could then split your array into 3 chunks of 3-4 disks each and use hardware RAID instead of the software raidtools. Cheers, Chris From [EMAIL PROTECTED] Mon Jul 10 17:10:27 2000 i have not used adaptec 160 cards, but i have found most everything else they make to be very finicky about cabling and termination, and have had hard drives give trouble on adaptec that worked fine on other cards. my money stays with a lsi/symbios/ncr based card. tekram is a good vendor, and symbios themselves have a nice 64 bit wide, dual channel pci scsi card. which does lead to the point about pci. even _IF_ you could get the entire pci bus to do your disk transfers, you will find that you would still need more bandwidth for stuff like using your nics. so, i suggest you investigate a motherboard with either 66mhz pci or 64 bit pci, or both. perhaps alpha? allan Gregory Leblanc [EMAIL PROTECTED] said: -Original Message- From: Seth Vidal [mailto:[EMAIL PROTECTED]] Sent: Monday, July 10, 2000 12:23 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: speed and scaling So were considering the following: Dual Processor P3 something. ~1gb ram. multiple 75gb ultra 160 drives - probably ibm's 10krpm drives Adaptec's best 160 controller that is supported by linux. The data does not have to be redundant or stable - since it can be restored from tape at almost any time. so I'd like to put this in a software raid 0 array for the speed. So my questions are these: Is 90MB/s a reasonable speed to be able to achieve in a raid0 array across say 5-8 drives? Assuming sequential reads, you should be able to get this from good drives. What controllers/drives should I be looking at? I'm not familiar with current top-end drives, but you should be looking for at least 4MB of cache on the drives. I think the best drives that you'll find will be able to deliver 20MB/sec without trouble, possibly a bit more. I seem to remember somebody on this liking Adaptec cards, but nobody on the SPARC lists will touch the things. I might look at a Tekram, or a Symbios based card, I've heard good things about them, and they're used on some of the bigger machines that I've worked with. Later, Grego -- Christopher Mauritz [EMAIL PROTECTED]
RE: speed and scaling
i have not used adaptec 160 cards, but i have found most everything else they make to be very finicky about cabling and termination, and have had hard drives give trouble on adaptec that worked fine on other cards. my money stays with a lsi/symbios/ncr based card. tekram is a good vendor, and symbios themselves have a nice 64 bit wide, dual channel pci scsi card. can you tell me the model number on that card? which does lead to the point about pci. even _IF_ you could get the entire pci bus to do your disk transfers, you will find that you would still need more bandwidth for stuff like using your nics. right. so, i suggest you investigate a motherboard with either 66mhz pci or 64 bit pci, or both. perhaps alpha? the money I would spend on an alpha precludes that option But some of dell's server systems support 64bit buses. thanks -sv
Re: speed and scaling
FWIW, you are going to have trouble pushing anywhere near 90MB/s out of a gigabit ethernet card, at least under 2.2. I don't have any experience w/ 2.4 yet. I hadn't planned on implementing this under 2.2 - I realize the constraints on the network performance. I've heard good things about 2.4's ability to scale to those levels though. thanks for the advice. -sv
Re: speed and scaling
On Mon, Jul 10, 2000 at 05:40:54PM -0400, Seth Vidal wrote: FWIW, you are going to have trouble pushing anywhere near 90MB/s out of a gigabit ethernet card, at least under 2.2. I don't have any experience w/ 2.4 yet. I hadn't planned on implementing this under 2.2 - I realize the constraints on the network performance. I've heard good things about 2.4's ability to scale to those levels though. 2.4.x technically doesn't exist yet. There are some (pre) test versions by Linux and Alan Cox out awaiting feedback from testers, but nothing solid or consistent yet. Be careful when using these for serious work. Newer != Better Phil -- Philip Edelbrock -- IS Manager -- Edge Design, Corvallis, OR [EMAIL PROTECTED] -- http://www.netroedge.com/~phil PGP F16: 01 D2 FD 01 B5 46 F4 F0 3A 8B 9D 7E 14 7F FB 7A
RE: speed and scaling
I'd try an alpha machine, with 66MHz-64bit PCI bus, and interleaved memory access, to improve memory bandwidth. It costs around $1 with 512MB of RAM, see SWT (or STW) or Microway. This cost is small compared to the disks. I've never had trouble with adaptec cards, if you terminate things according to specs, and preferably use terminators on the cables, not the card one. In fact I once had a termination problem, because I was using the card for it... It hosed my raid5 array, because there were two disks on that card... Another advantage of the alpha is that you have more PCI slots. I'd put 3 disks on each card, and use about 4 of them per machine. This should be enough to get you 500GB. If there are lots of network traffic this will likely be your bottleneck, particularly because of latency. Might I also suggest a good UPS system? :-) Ah, and a journaling FS...
Re: speed and scaling
There are some (pre) test versions by Linux and Alan Cox out awaiting feedback from testers, but nothing solid or consistent yet. Be careful when using these for serious work. Newer != Better This isn't being planned for the next few weeks - its 2-6month planning that I'm doing. So I'm estimating that 2.4 should be out w/i 6months. I think thats a reasonable guess. -sv
RE: speed and scaling
I'd try an alpha machine, with 66MHz-64bit PCI bus, and interleaved memory access, to improve memory bandwidth. It costs around $1 with 512MB of RAM, see SWT (or STW) or Microway. This cost is small compared to the disks. The alpha comes with other headaches I'd rather not involve myself with - in addition the costs of the disks is trivial - 7 75gig scsi's @$1k each is only $7k - and the machine housing the machines also needs to be one which will do some of the processing - and all of their code is X86 - so I'm hesistant to suggest alphas for this. Another advantage of the alpha is that you have more PCI slots. I'd put 3 disks on each card, and use about 4 of them per machine. This should be enough to get you 500GB. More how - the current boards I'm working with have 6-7 pci slots - no ISA's at all. The alphas we have here have the same number of slots. Might I also suggest a good UPS system? :-) Ah, and a journaling FS... the ups is a must -the journaling filesystem is at issue too - In an ideal world there will be a Journaling File system that works correctly with sw raid :) -sv
Re: speed and scaling
From [EMAIL PROTECTED] Mon Jul 10 17:53:34 2000 If you can take the performance hit, the Mylex ExtremeRAID cards come in a 3-channel variety. You could then split your array into 3 chunks of 3-4 disks each and use hardware RAID instead of the software raidtools. I've not had good performance out of mylex. In fact its been down-right shoddy. I'm hesistant to purchase from them again. Unfortunately, they are the Ferarri of the hardware RAID cards. Compare an ExtremeRAID card against anything from DPT or ICP-Vortex. There is no comparison. I'm not sure if it's poor hardware design or just brilliant driver design by Leonard Zubkof, but the Mylex cards are the performance king for hardware RAID under Linux (and Windows NT/2K for that matter). Cheers, Chris -- Christopher Mauritz [EMAIL PROTECTED]
Re: speed and scaling
From [EMAIL PROTECTED] Mon Jul 10 18:43:11 2000 There are some (pre) test versions by Linux and Alan Cox out awaiting feedback from testers, but nothing solid or consistent yet. Be careful when using these for serious work. Newer != Better This isn't being planned for the next few weeks - its 2-6month planning that I'm doing. So I'm estimating that 2.4 should be out w/i 6months. I think thats a reasonable guess. That's a really bad assumption. 2.4 has been a "real soon now" item since Januaryand it still is hanging in the vapors. If you're doing this for "production work," I'd plan on a 2.2 kernel or some known "safe" 2.3 kernel. C -- Christopher Mauritz [EMAIL PROTECTED]
Re: speed and scaling
On Mon, 10 Jul 2000, Seth Vidal wrote: What I was thinking was a good machine with a 64bit pci bus and/or multiple buses. And A LOT of external enclosures. Multiple Mylex extremeRAID's. I've had some uncomfortable experiences with hw raid controllers - ie: VERY poor performance and exbortitant prices. You're thinking of DPT :) The Mylex stuff (at least the low end AccleRAID's) are cheap and not too slow. -- Jon Lewis *[EMAIL PROTECTED]*| I route System Administrator| therefore you are Atlantic Net| _ http://www.lewis.org/~jlewis/pgp for PGP public key_
Re: speed and scaling
On Mon, 10 Jul 2000, Seth Vidal wrote: arguably only 500gb per machine will be needed. I'd like to get the fastest possible access rates from a single machine to the data. Ideally 90MB/s+ Is this vastly read-only or will write speed also be a factor? mostly read-only. If it were me, I'd do big RAID5 arrays. Sure, you have the data on tape, but do you want to sit around while hundreds of GB are restored from tape? RAID5 should give you the read speed of RAID0, and if you're not writing much, the write penalty shoulnd't be so bad. If it were totally read-only, you could mount ro, and save yourself considerable fsck time if there's an impropper shutdown. -- Jon Lewis *[EMAIL PROTECTED]*| I route System Administrator| therefore you are Atlantic Net| _ http://www.lewis.org/~jlewis/pgp for PGP public key_