Re: [GENERAL] SCSI vs. IDE performance test
Base-two artihmetic sounds pretty broad. If only you could come up with a scheme for division and multiplication by powers of two through bitshifting. On Wed, 26 Nov 2003, Tom Lane wrote: Randolf Richardson [EMAIL PROTECTED] writes: They've managed to patent ye olde elevator algorithm?? The USPTO really is without a clue, isn't it :-( It's not the USPTO's fault -- the problem is that nobody objected to it while it was in the Patent Pending state. If their examiner had even *minimal* competency in the field, it would not have gotten to the Patent Pending state. Algorithms that are well documented in the standard textbooks of thirty years ago do not qualify as something people should have to stand guard against. Perhaps I should try to patent base-two arithmetic, and hope no one notices till it goes through ... certainly the USPTO won't notice ... regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [GENERAL] SCSI vs. IDE performance test
Ben wrote: Base-two artihmetic sounds pretty broad. If only you could come up with a scheme for division and multiplication by powers of two through bitshifting. I already have that patent! :-) -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [GENERAL] SCSI vs. IDE performance test
we have no portable means of expressing that exact constraint to the kernel Does this mean that specific operating systems have a better way of dealing with this? Which ones and how? I'm not aware of any that offer a way of expressing write these particular blocks before those particular blocks. It doesn't seem like it would require rocket scientists to devise such an API, but no one's got round to it yet. Part of the problem is that the issue would have to be approached at multiple levels: there is no point in offering an OS-level API for this when the hardware underlying the bus-level API (IDE) is doing its level best to sabotage the entire semantics. [sNip] Actually, NetWare is one OS that does this, and has been doing so since the 1980s with version 2 (version 6.5 is the current version today). They have a Patented caching algorithm called Elevator Seeking which both prolongs the life of the drive by reducing wear-and-tear and improving read/write performance by minimizing seek operations. With IDE it seems that this caching algorithm is also beneficial, but it really shines with SCSI drives. In all my experience, SCSI drives are much faster and far more reliable than IDE drives. I've always assumed that it boils down to you get what you pay for. -- Randolf Richardson - [EMAIL PROTECTED] Inter-Corporate Computer Network Services, Inc. Vancouver, British Columbia, Canada http://www.8x.ca/ This message originated from within a secure, reliable, high-performance network ... a Novell NetWare network. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] SCSI vs. IDE performance test
Tom, this discussion brings up something that's been bugging me about the recommendations for getting more performance out of PG.. in particular the one that suggests you put your WAL files on a different physical drive from the database. Consider the following scenario: Database on drive1 WAL on drive2 1. PG write of some sort occurs. 2. PG writes out the WAL. 3. PG writes out the data. 4. PG updates the WAL to reflect data actually written. 5. System crashes/reboots/whatever. With the DB and the WAL on different drives, it seems possible to me that drive2 could've fsync()'d or otherwise properly written all of the data out, but drive1 could have failed somewhere along the way and not actually written the data to the DB. The next time PG is brought up, the WAL would indicate the transaction, as it were, was a success.. but the data wouldn't actually be there. In the case of using only one drive, the rollback (from a FS perspective) couldn't possibly occur in such a way as to leave step 4 as a success, but step 3 as a failure -- worst case, the data would be written out but the WAL wouldn't have been updated (rolled back say by the FS) and thus PG will roll back the data itself, or use whatever mechanism it uses to insure data integrity is consistent with the WAL. Am I smoking something here or is this a real, if rare in practice, risk that occurs when you have the WAL on a different drive than the data is on? At 17:39 10/27/2003, Tom Lane wrote: Rick Gigger [EMAIL PROTECTED] writes: It seems to me file system journaling should fix the whole problem by giving you a record of what was actually commited to disk and what was not. Nope, a journaling FS has exactly the same problem Postgres does (because the underlying WAL concept is the same: write the log entries before you change the files they describe). If the drive lies about write order, the FS can be screwed just as badly. Now the FS code might have a low-level way to force write order that Postgres doesn't have access to ... but simply uttering the magic incantation journaling file system will not make this problem disappear. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [GENERAL] SCSI vs. IDE performance test
Thanks! Now it is much, much more clear. It leaves me with a few additional questions though. Question 1: we have no portable means of expressing that exact constraint to the kernel Does this mean that specific operating systems have a better way of dealing with this? Which ones and how? I'm guessing that it couldn't make to big of a performance difference or it would probably be implemented already. Question 2: Do serial ATA drives suffer from the same issue? - Original Message - From: Tom Lane [EMAIL PROTECTED] To: Rick Gigger [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Monday, October 27, 2003 5:05 PM Subject: Re: [GENERAL] SCSI vs. IDE performance test Rick Gigger [EMAIL PROTECTED] writes: ahhh. lies about write order is the phrase that I was looking for. That seemed to make sense but I didn't know if I could go directly from lying about fsync to that. Obviously I don't understand exactly what fsync is doing. What we actually care about is write order: WAL entries have to hit the platter before the corresponding data-file changes do. Unfortunately we have no portable means of expressing that exact constraint to the kernel. We use fsync() (or related constructs) instead: issue the WAL writes, fsync the WAL file, then issue the data-file writes. This constrains the write ordering more than is really needed, but it's the best we can do in a portable Unix application. The problem is that the kernel thinks fsync is done when the disk drive reports the writes are complete. When we say a drive lies about this, we mean it accepts a sector of data into its on-board RAM and then immediately claims write-complete, when in reality the data hasn't hit the platter yet and will be lost if power dies before the drive gets around to writing it. So we can have a scenario where we think WAL is down to disk and go ahead with issuing data-file writes. These will also be shoved over to the drive and stored in its on-board RAM. Now the drive has multiple sectors pending write in its buffers. If it chooses to write these in some order other than the order they were given to it, it could write the data file updates to disk first. If power drops *now*, we lose, because the data files are inconsistent and there's no WAL entry to tell us to fix it. Got it? It's really the combination of lie about write completion and write pending sectors out of order that can mess things up. The reason IDE drives have to do this for reasonable performance is that the IDE interface is single-threaded: you can only have one read or write in process at a time, from the point of view of the kernel-to-drive interface. But in order to schedule reads and writes in a way that makes sense physically (minimizes seeks), the drive has to have multiple read and write requests pending that it can pick and choose from. The only possibility to do that in the IDE world is to let a write complete in interface terms before it's really done ... that is, lie. The reason SCSI drives do *not* do this is that the SCSI interface is logically multi-threaded: you can have multiple reads or writes pending at once. When you want to write on a SCSI drive, you send over a command that says write this data at this sector. Sometime later the drive sends back a status report yessir boss, I done did that write. Similarly, a read consists of a command read this sector, followed sometime later by a response that delivers the requested data. But you can send other commands to read or write other sectors meanwhile, and the drive is free to reorder them to suit its convenience. So in the SCSI world, there is no need for the drive to lie in order to do its own read/write scheduling. The kernel knows the truth about whether a given sector has hit disk, and so it won't conclude that the WAL file has been completely fsync'd until it really is all down to the platter. This is also why SCSI disks shine on the read side when you have lots of processes doing reads: in an IDE drive, there is no way for the drive to satisfy read requests in any order but the one they're issued in. If the kernel guesses wrong about the best ordering for a set of read requests, then everybody waits for the seeks needed to get the earlier processes' data. A SCSI drive can fetch the nearest data first, and then that requester is freed to make progress in the CPU while the other guys wait for their longer seeks. There's no win here with a single active user process (since it probably wants specific data in a specific order), but it's a huge win if lots of processes are making unrelated read requests. Clear now? (In a previous lifetime I wrote SCSI disk driver code ...) regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [GENERAL] SCSI vs. IDE performance test
Rick Gigger [EMAIL PROTECTED] writes: we have no portable means of expressing that exact constraint to the kernel Does this mean that specific operating systems have a better way of dealing with this? Which ones and how? I'm not aware of any that offer a way of expressing write these particular blocks before those particular blocks. It doesn't seem like it would require rocket scientists to devise such an API, but no one's got round to it yet. Part of the problem is that the issue would have to be approached at multiple levels: there is no point in offering an OS-level API for this when the hardware underlying the bus-level API (IDE) is doing its level best to sabotage the entire semantics. Do serial ATA drives suffer from the same issue? Um, not an expert, but I think ATA is the same as IDE except for bus width and transfer rate. If either one allows for multiple concurrent read/write transactions I'll be very surprised. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] SCSI vs. IDE performance test
On Tue, Oct 28, 2003 at 12:17:59AM -0500, Tom Lane wrote: Rick Gigger [EMAIL PROTECTED] writes: Do serial ATA drives suffer from the same issue? Um, not an expert, but I think ATA is the same as IDE except for bus width and transfer rate. If either one allows for multiple concurrent read/write transactions I'll be very surprised. Well, some googleing around seems to indicate that Serial ATA I/ATA-6 has Tagged Command Queueing (TCQ) which is adding this feature specifically. Whether it is a mandatory part of the spec I don't know. -- Martijn van Oosterhout [EMAIL PROTECTED] http://svana.org/kleptog/ All that is needed for the forces of evil to triumph is for enough good men to do nothing. - Edmond Burke The penalty good people pay for not being interested in politics is to be governed by people worse than themselves. - Plato pgp0.pgp Description: PGP signature
Re: [GENERAL] SCSI vs. IDE performance test
Martijn van Oosterhout [EMAIL PROTECTED] writes: Well, some googleing around seems to indicate that Serial ATA I/ATA-6 has Tagged Command Queueing (TCQ) which is adding this feature specifically. Whether it is a mandatory part of the spec I don't know. Yeah? If so, and *if fully implemented* on both sides of the interface, this would eliminate the architectural advantages I was just sketching for SCSI. I can't claim to be up on what's happening in the IDE/ATA world though... regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] SCSI vs. IDE performance test
It seems to me file system journaling should fix the whole problem by giving you a record of what was actually commited to disk and what was not. I must not understand journaling correctly. Can anyone explain to me how journaling works. - Original Message - From: Bruce Momjian [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: Stephen [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Monday, October 27, 2003 12:14 PM Subject: Re: [GENERAL] SCSI vs. IDE performance test Mike Benoit wrote: I just ran some benchmarks against a 10K SCSI drive and 7200 RPM IDE drive here: http://fsbench.netnation.com/ The results vary quite a bit, and it seems the file system you use can make a huge difference. SCSI is obviously faster, but a 20% performance gain for 5x the cost is only worth it for a very small percentage of people, I would think. Did you turn off the IDE write cache? If not, the SCSI drive is reliable in case of OS failure, while the IDE is not. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] SCSI vs. IDE performance test
-Original Message- From: Stephen [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 22, 2003 9:02 AM To: [EMAIL PROTECTED] Subject: Re: [GENERAL] SCSI vs. IDE performance test The SCSI improvement over IDE seems overrated in the test. I would have expected at most a 30% improvement. Other reviews seem to point out that IDE performs just as well or better. See Tom's hardware: http://www20.tomshardware.com/storage/20020305 /index.html My own tests show that 15K RPM ultra 320 SCSI drives are considerably faster than any IDE storage. This ATA drive: http://www.wdc.com/en/products/WD360GD.asp Performs as well or better than many SCSI drives, and are not terribly expensive. Therefore, these are a very good choice if price performance is more important than absolute performance. But if you need absolute horsepower, then one of these (or other 15K Ultra320 equivalent) won't be beaten: http://www.storagereview.com/articles/200304/200304068C073x0_1.html ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [GENERAL] SCSI vs. IDE performance test
Unwrap this link (if your newsreader folds it) and click on it for hard drive performance: http://www.storagereview.com/php/benchmark/compare_rtg_2001.php?typeID=1 0testbedID=3osID=4raidconfigID=1numDrives=1devID_0=232devID_1=237 devID_2=213devID_3=221devID_4=216devID_5=249devID_6=250devCnt=7 The important part for database is Server Suite ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] SCSI vs. IDE performance test
Dann Corbit wrote: Unwrap this link (if your newsreader folds it) and click on it for hard drive performance: http://www.storagereview.com/php/benchmark/compare_rtg_2001.php?typeID=1 0testbedID=3osID=4raidconfigID=1numDrives=1devID_0=232devID_1=237 devID_2=213devID_3=221devID_4=216devID_5=249devID_6=250devCnt=7 The important part for database is Server Suite ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match Fairly old data, but it shows AMAZING differences in head seek time. I didn't know head seeks were below 8ms for anything, even today. Also, from what I've read, the SATA drives of those days were non existent? The earliest SATA drives I've read about were just SATA interfaces on OLDER IDE hardware - the manufacutrers had not really signed up on the concept enough to put their good hardware underneath the interface. -- You are behaving like a man, is an insult from some women, a compliment from an good woman. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org