Re: Adding disks -the pain. Also vinum
On Fri, Aug 06, 1999 at 10:53:54AM +0930, Greg Lehey wrote: On Tuesday, 3 August 1999 at 23:20:45 +0200, Bernd Walter wrote: On Tue, Aug 03, 1999 at 03:59:46PM +0930, Greg Lehey wrote: On Tuesday, 3 August 1999 at 8:12:17 +0200, Bernd Walter wrote: For UFS/FFS there is nothing worth seting the stripesize to low. It is generally slower to acces 32k on different HDDs than to acces 64k on one HDD. It is always slower where the positioning time is greater than the transfer time for 32 kB. On modern disks, 32 kB transfer in about 300 µs. The average rotational latency of a disk running at 10,800 rpm is 2.8 ms, and even with spindle synchronization there's no way to avoid rotational latency under these circumstances. It shouldn't be the latency, because with spindlesync they are the same on both disks if the transfer is requested exactly the same time what is of course idealized.. Spindle sync ensures that the same sectors on different disks are under the heads at the same time. When you perform a stripe transfer, you're not accessing the same sectors, you're accessing different sectors. There's no way to avoid rotational latency under these circumstances. We are talking about the same point with the sme results. I agree you will only access the same sectors in some special cases. Lets say 2 Striped disks with 512 Byte stripes and FSS with 1k Frags. The point is that you have more then a single transfer. With small transfers spindle sync is able to winback some of the performance you have lost with a to small stripe size. No, this isn't correct, unless you're running 512 byte stripes. In That's what I meant with a 'to small stripe size' this case, a single-stripe transfer of, say, 8 kB with the disks above would take about 7 ms total latency (same as with a single disk), but the transfer would take less time--5 µs instead of 80 µs. You'd need 16 disks, and you'd tie them all up for 7 ms. And this doesn't consider the times of SCSI command setup and such. In the rare case you need max bandwith for only one Aplication and one stream I like to hear that all drives are tied up in the job. Basically, this is not the way to go if you have multiple clients for your storage. Look at http://www.lemis.com/vinum/problems.html and http://www.lemis.com/vinum/Performance-issues.html and for more details. Spindle Sycronisation won't bring you that much on modern HDDs - I tried it using 5 Seagate Elite 2.9G (5,25" Full-Height). It should be useful for RAID-3 and streaming video. I case of large transfers it will make sense - but FFS is unable to set up big enough requests. No, this is a case where you're only using one client, so my argumentation above doesn't apply (since you're reading sequentially, so latency is no longer an issue). I don't know what bandwith streaming video needs, but If you need sdditional bandwith of all used disks the first thing to do is linearising access to the disks. Multifileaccess often breaks linearisation. All what I trid to say is that it is hopeless to expect much more bandwith than a single disk in single process access. As an example: Yesterday I was asked if 6 old striped disks would be faster for cvsup than one of his modern disks because it sometime needs more than one telephone unit. The answer is no. cvsupd (if run regulary) spends most of its time sending The directory content of the destination. Usually there are no other programms accessng any disks at the same time, so you can benefit only a very small bit from additional drives. Maybe the additional block cache on the drives and for updating atime. Beleave it or not multiple files are accessed in servers and maybe under some windomanagers, but on many home and desktop machines it happens only rarely. I personaly use as an example 7 200M IBM Disks striped to one volume (They all have LEDs :). The only way for to utilize nearly all in a sensefull way is writing with softupdates enabled. -- B.Walter COSMO-Project http://www.cosmo-project.de [EMAIL PROTECTED] Usergroup[EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Adding disks -the pain. Also vinum
On Tuesday, 3 August 1999 at 23:20:45 +0200, Bernd Walter wrote: On Tue, Aug 03, 1999 at 03:59:46PM +0930, Greg Lehey wrote: On Tuesday, 3 August 1999 at 8:12:17 +0200, Bernd Walter wrote: For UFS/FFS there is nothing worth seting the stripesize to low. It is generally slower to acces 32k on different HDDs than to acces 64k on one HDD. It is always slower where the positioning time is greater than the transfer time for 32 kB. On modern disks, 32 kB transfer in about 300 µs. The average rotational latency of a disk running at 10,800 rpm is 2.8 ms, and even with spindle synchronization there's no way to avoid rotational latency under these circumstances. It shouldn't be the latency, because with spindlesync they are the same on both disks if the transfer is requested exactly the same time what is of course idealized.. Spindle sync ensures that the same sectors on different disks are under the heads at the same time. When you perform a stripe transfer, you're not accessing the same sectors, you're accessing different sectors. There's no way to avoid rotational latency under these circumstances. The point is that you have more then a single transfer. With small transfers spindle sync is able to winback some of the performance you have lost with a to small stripe size. No, this isn't correct, unless you're running 512 byte stripes. In this case, a single-stripe transfer of, say, 8 kB with the disks above would take about 7 ms total latency (same as with a single disk), but the transfer would take less time--5 µs instead of 80 µs. You'd need 16 disks, and you'd tie them all up for 7 ms. And this doesn't consider the times of SCSI command setup and such. Basically, this is not the way to go if you have multiple clients for your storage. Look at http://www.lemis.com/vinum/problems.html and http://www.lemis.com/vinum/Performance-issues.html and for more details. Spindle Sycronisation won't bring you that much on modern HDDs - I tried it using 5 Seagate Elite 2.9G (5,25 Full-Height). It should be useful for RAID-3 and streaming video. I case of large transfers it will make sense - but FFS is unable to set up big enough requests. No, this is a case where you're only using one client, so my argumentation above doesn't apply (since you're reading sequentially, so latency is no longer an issue). Greg -- See complete headers for address, home page and phone numbers finger g...@lemis.com for PGP public key To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Adding disks -the pain. Also vinum
On Fri, Aug 06, 1999 at 10:53:54AM +0930, Greg Lehey wrote: On Tuesday, 3 August 1999 at 23:20:45 +0200, Bernd Walter wrote: On Tue, Aug 03, 1999 at 03:59:46PM +0930, Greg Lehey wrote: On Tuesday, 3 August 1999 at 8:12:17 +0200, Bernd Walter wrote: For UFS/FFS there is nothing worth seting the stripesize to low. It is generally slower to acces 32k on different HDDs than to acces 64k on one HDD. It is always slower where the positioning time is greater than the transfer time for 32 kB. On modern disks, 32 kB transfer in about 300 µs. The average rotational latency of a disk running at 10,800 rpm is 2.8 ms, and even with spindle synchronization there's no way to avoid rotational latency under these circumstances. It shouldn't be the latency, because with spindlesync they are the same on both disks if the transfer is requested exactly the same time what is of course idealized.. Spindle sync ensures that the same sectors on different disks are under the heads at the same time. When you perform a stripe transfer, you're not accessing the same sectors, you're accessing different sectors. There's no way to avoid rotational latency under these circumstances. We are talking about the same point with the sme results. I agree you will only access the same sectors in some special cases. Lets say 2 Striped disks with 512 Byte stripes and FSS with 1k Frags. The point is that you have more then a single transfer. With small transfers spindle sync is able to winback some of the performance you have lost with a to small stripe size. No, this isn't correct, unless you're running 512 byte stripes. In That's what I meant with a 'to small stripe size' this case, a single-stripe transfer of, say, 8 kB with the disks above would take about 7 ms total latency (same as with a single disk), but the transfer would take less time--5 µs instead of 80 µs. You'd need 16 disks, and you'd tie them all up for 7 ms. And this doesn't consider the times of SCSI command setup and such. In the rare case you need max bandwith for only one Aplication and one stream I like to hear that all drives are tied up in the job. Basically, this is not the way to go if you have multiple clients for your storage. Look at http://www.lemis.com/vinum/problems.html and http://www.lemis.com/vinum/Performance-issues.html and for more details. Spindle Sycronisation won't bring you that much on modern HDDs - I tried it using 5 Seagate Elite 2.9G (5,25 Full-Height). It should be useful for RAID-3 and streaming video. I case of large transfers it will make sense - but FFS is unable to set up big enough requests. No, this is a case where you're only using one client, so my argumentation above doesn't apply (since you're reading sequentially, so latency is no longer an issue). I don't know what bandwith streaming video needs, but If you need sdditional bandwith of all used disks the first thing to do is linearising access to the disks. Multifileaccess often breaks linearisation. All what I trid to say is that it is hopeless to expect much more bandwith than a single disk in single process access. As an example: Yesterday I was asked if 6 old striped disks would be faster for cvsup than one of his modern disks because it sometime needs more than one telephone unit. The answer is no. cvsupd (if run regulary) spends most of its time sending The directory content of the destination. Usually there are no other programms accessng any disks at the same time, so you can benefit only a very small bit from additional drives. Maybe the additional block cache on the drives and for updating atime. Beleave it or not multiple files are accessed in servers and maybe under some windomanagers, but on many home and desktop machines it happens only rarely. I personaly use as an example 7 200M IBM Disks striped to one volume (They all have LEDs :). The only way for to utilize nearly all in a sensefull way is writing with softupdates enabled. -- B.Walter COSMO-Project http://www.cosmo-project.de ti...@cicely.de Usergroupi...@cosmo-project.de To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Adding disks -the pain. Also vinum
On Tue, Aug 03, 1999 at 03:59:46PM +0930, Greg Lehey wrote: On Tuesday, 3 August 1999 at 8:12:17 +0200, Bernd Walter wrote: For UFS/FFS there is nothing worth seting the stripesize to low. It is generally slower to acces 32k on different HDDs than to acces 64k on one HDD. It is always slower where the positioning time is greater than the transfer time for 32 kB. On modern disks, 32 kB transfer in about 300 µs. The average rotational latency of a disk running at 10,800 rpm is 2.8 ms, and even with spindle synchronization there's no way to avoid rotational latency under these circumstances. It shouldn't be the latency, because with spindlesync they are the same on both disks if the transfer is requested exactly the same time what is of course idealized.. The point is that you have more then a single transfer. With small transfers spindle sync is able to winback some of the performance you have lost with a to small stripe size. Spindle Sycronisation won't bring you that much on modern HDDs - I tried it using 5 Seagate Elite 2.9G (5,25" Full-Height). It should be useful for RAID-3 and streaming video. I case of large transfers it will make sense - but FFS is unable to set up big enough requests. -- B.Walter COSMO-Project http://www.cosmo-project.de [EMAIL PROTECTED] Usergroup[EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Adding disks -the pain. Also vinum
On Tue, Aug 03, 1999 at 01:35:54PM +0930, Greg Lehey wrote: On Tuesday, 3 August 1999 at 11:11:39 +0800, Stephen Hocking-Senior Programmer PGS Tensor Perth wrote: No, it would cause a higher I/O load. Vinum doesn't transfer entire stripes, it transfers what you ask for. With a large stripe size, the chances are higher that you can perform the transfer with only a single I/O. If you use n*64K stripes a UFS/FFS should never access 2 disks at once. Looking at the systat display, the 8k fs blocks do seem to be clustered into larger requests, so I'm not too worried about the FS block size. What have people observed with trying larger FS block sizes? I don't know if anybody has tried larger FS blocks than 8 kB. I once created a file system with 256 kB blocks (just to see if it could be done). I also tried 512 kB blocks, but newfs died of an overflow. I'd expect that you would see a marked drop in performance, assuming that it would work at all. AFAIK the limit is 64k because clustering is limitited to 64k and the fs don't seem to handle it well. I'm using 64k very often, because my growfs tool is already able with this blocksize to grow a ffs over 1Tb. -- B.Walter COSMO-Project http://www.cosmo-project.de ti...@cicely.de Usergroupi...@cosmo-project.de To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Adding disks -the pain. Also vinum
On Tue, Aug 03, 1999 at 12:16:06PM +0800, Stephen Hocking-Senior Programmer PGS Tensor Perth wrote: No, it would cause a higher I/O load. Vinum doesn't transfer entire stripes, it transfers what you ask for. With a large stripe size, the chances are higher that you can perform the transfer with only a single I/O. Even if I'm using really large reads? Several month ago I beleaved the same but there are severall points here: - UFS/FFS don't handle clustering over 64k - modern harddisks do preread simply by having a reversed sector layout. - without spindle syncronisation you will have additional latency - vinum don't aggregate access to subdisks, so the transfer to the subdisks is limited by the stripe size. For UFS/FFS there is nothing worth seting the stripesize to low. It is generally slower to acces 32k on different HDDs than to acces 64k on one HDD. Spindle Sycronisation won't bring you that much on modern HDDs - I tried it using 5 Seagate Elite 2.9G (5,25 Full-Height). There was no win using FFS. If you need performance try softupdates. At least for writing it should benefit much from striped partitions. I never realy measured but I was astounished that you can have over 800 transactions/sec on a ccd with 6 striped disks. -- B.Walter COSMO-Project http://www.cosmo-project.de ti...@cicely.de Usergroupi...@cosmo-project.de To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Adding disks -the pain. Also vinum
On Tuesday, 3 August 1999 at 8:12:17 +0200, Bernd Walter wrote: On Tue, Aug 03, 1999 at 12:16:06PM +0800, Stephen Hocking-Senior Programmer PGS Tensor Perth wrote: No, it would cause a higher I/O load. Vinum doesn't transfer entire stripes, it transfers what you ask for. With a large stripe size, the chances are higher that you can perform the transfer with only a single I/O. Even if I'm using really large reads? Several month ago I beleaved the same but there are severall points here: - UFS/FFS don't handle clustering over 64k - modern harddisks do preread simply by having a reversed sector layout. - without spindle syncronisation you will have additional latency - vinum don't aggregate access to subdisks, so the transfer to the subdisks is limited by the stripe size. Note, BTW, that this wouldn't make much sense. To aggregate access to consecutive stripes, your transfer would have to involve *all* the disks in the stripe set, which would be a ridiculous performance hit. Read http://www.lemis.com/vinum/Performance-issues.html for more details. For UFS/FFS there is nothing worth seting the stripesize to low. It is generally slower to acces 32k on different HDDs than to acces 64k on one HDD. It is always slower where the positioning time is greater than the transfer time for 32 kB. On modern disks, 32 kB transfer in about 300 µs. The average rotational latency of a disk running at 10,800 rpm is 2.8 ms, and even with spindle synchronization there's no way to avoid rotational latency under these circumstances. Spindle Sycronisation won't bring you that much on modern HDDs - I tried it using 5 Seagate Elite 2.9G (5,25 Full-Height). It should be useful for RAID-3 and streaming video. Greg -- See complete headers for address, home page and phone numbers finger g...@lemis.com for PGP public key To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Adding disks -the pain. Also vinum
On Tue, Aug 03, 1999 at 03:59:46PM +0930, Greg Lehey wrote: On Tuesday, 3 August 1999 at 8:12:17 +0200, Bernd Walter wrote: For UFS/FFS there is nothing worth seting the stripesize to low. It is generally slower to acces 32k on different HDDs than to acces 64k on one HDD. It is always slower where the positioning time is greater than the transfer time for 32 kB. On modern disks, 32 kB transfer in about 300 µs. The average rotational latency of a disk running at 10,800 rpm is 2.8 ms, and even with spindle synchronization there's no way to avoid rotational latency under these circumstances. It shouldn't be the latency, because with spindlesync they are the same on both disks if the transfer is requested exactly the same time what is of course idealized.. The point is that you have more then a single transfer. With small transfers spindle sync is able to winback some of the performance you have lost with a to small stripe size. Spindle Sycronisation won't bring you that much on modern HDDs - I tried it using 5 Seagate Elite 2.9G (5,25 Full-Height). It should be useful for RAID-3 and streaming video. I case of large transfers it will make sense - but FFS is unable to set up big enough requests. -- B.Walter COSMO-Project http://www.cosmo-project.de ti...@cicely.de Usergroupi...@cosmo-project.de To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Adding disks -the pain. Also vinum
On Tue, Aug 03, 1999 at 01:35:54PM +0930, Greg Lehey wrote: On Tuesday, 3 August 1999 at 11:11:39 +0800, Stephen Hocking-Senior Programmer PGS Tensor Perth wrote: No, it would cause a higher I/O load. Vinum doesn't transfer entire stripes, it transfers what you ask for. With a large stripe size, the chances are higher that you can perform the transfer with only a single I/O. If you use n*64K stripes a UFS/FFS should never access 2 disks at once. Looking at the systat display, the 8k fs blocks do seem to be clustered into larger requests, so I'm not too worried about the FS block size. What have people observed with trying larger FS block sizes? I don't know if anybody has tried larger FS blocks than 8 kB. I once created a file system with 256 kB blocks (just to see if it could be done). I also tried 512 kB blocks, but newfs died of an overflow. I'd expect that you would see a marked drop in performance, assuming that it would work at all. AFAIK the limit is 64k because clustering is limitited to 64k and the fs don't seem to handle it well. I'm using 64k very often, because my growfs tool is already able with this blocksize to grow a ffs over 1Tb. -- B.Walter COSMO-Project http://www.cosmo-project.de [EMAIL PROTECTED] Usergroup[EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Adding disks -the pain. Also vinum
On Tuesday, 3 August 1999 at 11:11:39 +0800, Stephen Hocking-Senior Programmer PGS Tensor Perth wrote: The people who I work for were about to junk a bunch of 6 year old disks when I snaffled them. Among them were 4 DEC DSP5400S (3.8GB each), with a nice external case. These disks had been doing duty on a boat carrying out seismic surveys, attached to misc. Sun workstations. These are typical of their vintage - full height 5 1/4 drives fast narrow SCSI2, and noisy as all blazes. I have them hooked up to a NCR810, as one striped FS (it's just for experiments, not valuable data). fdisking them was easy, but disklabelling them was a royal pain. I ended up editing the /etc/disktab file to add an appropriate label and running disklabel -w -B /dev/rda0c DSP5400S which still gives an error message, but appears to install the label. I only found out that it installed the label by accident, wasting a bunch of time in the process. Did you try 'disklabel -w da0 auto'? I created a striped volume across the 4 drives with the default stripe size of 256K. I read the rather interesting discussion within the man pages about the optimal stripe size and have a couple of queries. Firstly, the type of traffic that this 13.9GB filesystem will see will be mainly sequential reading and writing of large files. There will only be a few files (~2-30), each several gigs. (I'm fooling around with the seismic software at home, and typcal surveys can results in files many gigs in size). Given that FreeBSD breaks I/Os down into 64k chunks, would having a 64k stripe size give more parallelism? No, it would cause a higher I/O load. Vinum doesn't transfer entire stripes, it transfers what you ask for. With a large stripe size, the chances are higher that you can perform the transfer with only a single I/O. I'm seeing 4.4MB/s if I read from an individual disk, but only about 5.6MB/s when reading from the striped volume. How many concurrent processes? Remember that striping doesn't buy you anything with a single process. You might like to try rawio (ftp://ftp.lemis.com/pub/rawio.tar.gz) and see what that tells you. Looking at the systat display, the 8k fs blocks do seem to be clustered into larger requests, so I'm not too worried about the FS block size. What have people observed with trying larger FS block sizes? I don't know if anybody has tried larger FS blocks than 8 kB. I once created a file system with 256 kB blocks (just to see if it could be done). I also tried 512 kB blocks, but newfs died of an overflow. I'd expect that you would see a marked drop in performance, assuming that it would work at all. Greg -- See complete headers for address, home page and phone numbers finger g...@lemis.com for PGP public key To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message
Re: Adding disks -the pain. Also vinum
Did you try 'disklabel -w da0 auto'? Yup - it also complained. No, it would cause a higher I/O load. Vinum doesn't transfer entire stripes, it transfers what you ask for. With a large stripe size, the chances are higher that you can perform the transfer with only a single I/O. Even if I'm using really large reads? I'm seeing 4.4MB/s if I read from an individual disk, but only about 5.6MB/s when reading from the striped volume. How many concurrent processes? Remember that striping doesn't buy you anything with a single process. You might like to try rawio (ftp://ftp.lemis.com/pub/rawio.tar.gz) and see what that tells you. OK, I was just using good ol' dd, with dd if=/cfs/foo of=/dev/null bs=2m Looking at the systat display, the 8k fs blocks do seem to be clustered into larger requests, so I'm not too worried about the FS block size. What have people observed with trying larger FS block sizes? I don't know if anybody has tried larger FS blocks than 8 kB. I once created a file system with 256 kB blocks (just to see if it could be done). I also tried 512 kB blocks, but newfs died of an overflow. I'd expect that you would see a marked drop in performance, assuming that it would work at all. OK. The minimum data size read from these files tends to be about 10k. I'll have to try this all with a real app. Stephen -- The views expressed above are not those of PGS Tensor. We've heard that a million monkeys at a million keyboards could produce the Complete Works of Shakespeare; now, thanks to the Internet, we know this is not true.Robert Wilensky, University of California To Unsubscribe: send mail to majord...@freebsd.org with unsubscribe freebsd-hackers in the body of the message