Re: [GENERAL] SCSI vs. IDE performance test

2003-11-26 Thread Ben
Base-two artihmetic sounds pretty broad. If only you could come up with a
scheme for division and multiplication by powers of two through
bitshifting.

On Wed, 26 Nov 2003, Tom Lane wrote:

 Randolf Richardson [EMAIL PROTECTED] writes:
  They've managed to patent ye olde elevator algorithm??  The USPTO really
  is without a clue, isn't it :-(
 
  It's not the USPTO's fault -- the problem is that nobody objected to it
  while it was in the Patent Pending state.
 
 If their examiner had even *minimal* competency in the field, it would
 not have gotten to the Patent Pending state.  Algorithms that are well
 documented in the standard textbooks of thirty years ago do not qualify
 as something people should have to stand guard against.
 
 Perhaps I should try to patent base-two arithmetic, and hope no one
 notices till it goes through ... certainly the USPTO won't notice ...
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 3: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly
 



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [GENERAL] SCSI vs. IDE performance test

2003-11-26 Thread Bruce Momjian
Ben wrote:
 Base-two artihmetic sounds pretty broad. If only you could come up with a
 scheme for division and multiplication by powers of two through
 bitshifting.

I already have that patent!  :-)

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [GENERAL] SCSI vs. IDE performance test

2003-11-24 Thread Randolf Richardson, DevNet SysOp 29
 we have no portable means of expressing that exact constraint to the
 kernel
 Does this mean that specific operating systems have a better way of
 dealing with this?  Which ones and how?
 
 I'm not aware of any that offer a way of expressing write these
 particular blocks before those particular blocks.  It doesn't seem like
 it would require rocket scientists to devise such an API, but no one's
 got round to it yet.  Part of the problem is that the issue would have
 to be approached at multiple levels: there is no point in offering an
 OS-level API for this when the hardware underlying the bus-level API
 (IDE) is doing its level best to sabotage the entire semantics.
[sNip]

Actually, NetWare is one OS that does this, and has been doing so 
since the 1980s with version 2 (version 6.5 is the current version today).  
They have a Patented caching algorithm called Elevator Seeking which both 
prolongs the life of the drive by reducing wear-and-tear and improving 
read/write performance by minimizing seek operations.

With IDE it seems that this caching algorithm is also beneficial, but 
it really shines with SCSI drives.

In all my experience, SCSI drives are much faster and far more 
reliable than IDE drives.  I've always assumed that it boils down to you 
get what you pay for.

-- 
Randolf Richardson - [EMAIL PROTECTED]
Inter-Corporate Computer  Network Services, Inc.
Vancouver, British Columbia, Canada
http://www.8x.ca/

This message originated from within a secure, reliable,
high-performance network ... a Novell NetWare network.


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] SCSI vs. IDE performance test

2003-10-28 Thread Allen Landsidel
Tom, this discussion brings up something that's been bugging me about the 
recommendations for getting more performance out of PG.. in particular the 
one that suggests you put your WAL files on a different physical drive from 
the database.

Consider the following scenario:
Database on drive1
WAL on drive2
1. PG write of some sort occurs.
2. PG writes out the WAL.
3. PG writes out the data.
4. PG updates the WAL to reflect data actually written.
5. System crashes/reboots/whatever.
With the DB and the WAL on different drives, it seems possible to me that 
drive2 could've fsync()'d or otherwise properly written all of the data 
out, but drive1 could have failed somewhere along the way and not actually 
written the data to the DB.

The next time PG is brought up, the WAL would indicate the transaction, as 
it were, was a success.. but the data wouldn't actually be there.

In the case of using only one drive, the rollback (from a FS perspective) 
couldn't possibly occur in such a way as to leave step 4 as a success, but 
step 3 as a failure -- worst case, the data would be written out but the 
WAL wouldn't have been updated (rolled back say by the FS) and thus PG will 
roll back the data itself, or use whatever mechanism it uses to insure data 
integrity is consistent with the WAL.

Am I smoking something here or is this a real, if rare in practice, risk 
that occurs when you have the WAL on a different drive than the data is on?

At 17:39 10/27/2003, Tom Lane wrote:
Rick Gigger [EMAIL PROTECTED] writes:
 It seems to me file system journaling should fix the whole problem by 
giving
 you a record of what was actually commited to disk and what was not.

Nope, a journaling FS has exactly the same problem Postgres does
(because the underlying WAL concept is the same: write the log entries
before you change the files they describe).  If the drive lies about
write order, the FS can be screwed just as badly.  Now the FS code might
have a low-level way to force write order that Postgres doesn't have
access to ... but simply uttering the magic incantation journaling file
system will not make this problem disappear.
regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
   http://www.postgresql.org/docs/faqs/FAQ.html


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [GENERAL] SCSI vs. IDE performance test

2003-10-28 Thread Rick Gigger
Thanks!  Now it is much, much more clear.  It leaves me with a few
additional questions though.

Question 1:
we have no portable means of expressing that exact constraint to the
kernel
Does this mean that specific operating systems have a better way of dealing
with this?  Which ones and how?  I'm guessing that it couldn't make to big
of a performance difference or it would probably be implemented already.

Question 2:
Do serial ATA drives suffer from the same issue?



- Original Message - 
From: Tom Lane [EMAIL PROTECTED]
To: Rick Gigger [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Monday, October 27, 2003 5:05 PM
Subject: Re: [GENERAL] SCSI vs. IDE performance test


 Rick Gigger [EMAIL PROTECTED] writes:
  ahhh. lies about write order is the phrase that I was looking for.
That
  seemed to make sense but I didn't know if I could go directly from
lying
  about fsync to that.  Obviously I don't understand exactly what fsync
is
  doing.

 What we actually care about is write order: WAL entries have to hit the
 platter before the corresponding data-file changes do.  Unfortunately we
 have no portable means of expressing that exact constraint to the
 kernel.  We use fsync() (or related constructs) instead: issue the WAL
 writes, fsync the WAL file, then issue the data-file writes.  This
 constrains the write ordering more than is really needed, but it's the
 best we can do in a portable Unix application.

 The problem is that the kernel thinks fsync is done when the disk drive
 reports the writes are complete.  When we say a drive lies about this,
 we mean it accepts a sector of data into its on-board RAM and then
 immediately claims write-complete, when in reality the data hasn't hit
 the platter yet and will be lost if power dies before the drive gets
 around to writing it.

 So we can have a scenario where we think WAL is down to disk and go
 ahead with issuing data-file writes.  These will also be shoved over to
 the drive and stored in its on-board RAM.  Now the drive has multiple
 sectors pending write in its buffers.  If it chooses to write these in
 some order other than the order they were given to it, it could write
 the data file updates to disk first.  If power drops *now*, we lose,
 because the data files are inconsistent and there's no WAL entry to tell
 us to fix it.

 Got it?  It's really the combination of lie about write completion and
 write pending sectors out of order that can mess things up.

 The reason IDE drives have to do this for reasonable performance is that
 the IDE interface is single-threaded: you can only have one read or
 write in process at a time, from the point of view of the
 kernel-to-drive interface.  But in order to schedule reads and writes in
 a way that makes sense physically (minimizes seeks), the drive has to
 have multiple read and write requests pending that it can pick and
 choose from.  The only possibility to do that in the IDE world is to
 let a write complete in interface terms before it's really done ...
 that is, lie.

 The reason SCSI drives do *not* do this is that the SCSI interface is
 logically multi-threaded: you can have multiple reads or writes pending
 at once.  When you want to write on a SCSI drive, you send over a
 command that says write this data at this sector.  Sometime later the
 drive sends back a status report yessir boss, I done did that write.
 Similarly, a read consists of a command read this sector, followed
 sometime later by a response that delivers the requested data.  But you
 can send other commands to read or write other sectors meanwhile, and
 the drive is free to reorder them to suit its convenience.  So in the
 SCSI world, there is no need for the drive to lie in order to do its own
 read/write scheduling.  The kernel knows the truth about whether a given
 sector has hit disk, and so it won't conclude that the WAL file has been
 completely fsync'd until it really is all down to the platter.

 This is also why SCSI disks shine on the read side when you have lots of
 processes doing reads: in an IDE drive, there is no way for the drive to
 satisfy read requests in any order but the one they're issued in.  If the
 kernel guesses wrong about the best ordering for a set of read requests,
 then everybody waits for the seeks needed to get the earlier processes'
 data.  A SCSI drive can fetch the nearest data first, and then that
 requester is freed to make progress in the CPU while the other guys wait
 for their longer seeks.  There's no win here with a single active user
 process (since it probably wants specific data in a specific order), but
 it's a huge win if lots of processes are making unrelated read requests.

 Clear now?

 (In a previous lifetime I wrote SCSI disk driver code ...)

 regards, tom lane



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [GENERAL] SCSI vs. IDE performance test

2003-10-28 Thread Tom Lane
Rick Gigger [EMAIL PROTECTED] writes:
 we have no portable means of expressing that exact constraint to the
 kernel
 Does this mean that specific operating systems have a better way of dealing
 with this?  Which ones and how?

I'm not aware of any that offer a way of expressing write these
particular blocks before those particular blocks.  It doesn't seem like
it would require rocket scientists to devise such an API, but no one's
got round to it yet.  Part of the problem is that the issue would have
to be approached at multiple levels: there is no point in offering an
OS-level API for this when the hardware underlying the bus-level API
(IDE) is doing its level best to sabotage the entire semantics.

 Do serial ATA drives suffer from the same issue?

Um, not an expert, but I think ATA is the same as IDE except for bus
width and transfer rate.  If either one allows for multiple concurrent
read/write transactions I'll be very surprised.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] SCSI vs. IDE performance test

2003-10-28 Thread Martijn van Oosterhout
On Tue, Oct 28, 2003 at 12:17:59AM -0500, Tom Lane wrote:
 Rick Gigger [EMAIL PROTECTED] writes:
  Do serial ATA drives suffer from the same issue?
 
 Um, not an expert, but I think ATA is the same as IDE except for bus
 width and transfer rate.  If either one allows for multiple concurrent
 read/write transactions I'll be very surprised.

Well, some googleing around seems to indicate that Serial ATA I/ATA-6 has
Tagged Command Queueing (TCQ) which is adding this feature specifically.
Whether it is a mandatory part of the spec I don't know.

-- 
Martijn van Oosterhout   [EMAIL PROTECTED]   http://svana.org/kleptog/
 All that is needed for the forces of evil to triumph is for enough good
 men to do nothing. - Edmond Burke
 The penalty good people pay for not being interested in politics is to be
 governed by people worse than themselves. - Plato


pgp0.pgp
Description: PGP signature


Re: [GENERAL] SCSI vs. IDE performance test

2003-10-28 Thread Tom Lane
Martijn van Oosterhout [EMAIL PROTECTED] writes:
 Well, some googleing around seems to indicate that Serial ATA I/ATA-6 has
 Tagged Command Queueing (TCQ) which is adding this feature specifically.
 Whether it is a mandatory part of the spec I don't know.

Yeah?  If so, and *if fully implemented* on both sides of the interface,
this would eliminate the architectural advantages I was just sketching
for SCSI.  I can't claim to be up on what's happening in the IDE/ATA
world though...

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [GENERAL] SCSI vs. IDE performance test

2003-10-27 Thread Rick Gigger
It seems to me file system journaling should fix the whole problem by giving
you a record of what was actually commited to disk and what was not.  I must
not understand journaling correctly.  Can anyone explain to me how
journaling works.

- Original Message - 
From: Bruce Momjian [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Cc: Stephen [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Monday, October 27, 2003 12:14 PM
Subject: Re: [GENERAL] SCSI vs. IDE performance test


 Mike Benoit wrote:
  I just ran some benchmarks against a 10K SCSI drive and 7200 RPM IDE
  drive here:
 
  http://fsbench.netnation.com/
 
  The results vary quite a bit, and it seems the file system you use
  can make a huge difference.
 
  SCSI is obviously faster, but a 20% performance gain for 5x the cost is
  only worth it for a very small percentage of people, I would think.

 Did you turn off the IDE write cache?  If not, the SCSI drive is
 reliable in case of OS failure, while the IDE is not.

 -- 
   Bruce Momjian|  http://candle.pha.pa.us
   [EMAIL PROTECTED]   |  (610) 359-1001
   +  If your life is a hard drive, |  13 Roberts Road
   +  Christ can be your backup.|  Newtown Square, Pennsylvania
19073

 ---(end of broadcast)---
 TIP 6: Have you searched our list archives?

http://archives.postgresql.org



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [GENERAL] SCSI vs. IDE performance test

2003-10-24 Thread Dann Corbit
 -Original Message-
 From: Stephen [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, October 22, 2003 9:02 AM
 To: [EMAIL PROTECTED]
 Subject: Re: [GENERAL] SCSI vs. IDE performance test
 
 
 The SCSI improvement over IDE seems overrated in the test. I 
 would have expected at most a 30% improvement. Other reviews 
 seem to point out that IDE performs just as well or better.
 
 See Tom's hardware: 
 http://www20.tomshardware.com/storage/20020305 /index.html
 

My own tests show that 15K RPM ultra 320 SCSI drives are considerably
faster than any IDE storage.

This ATA drive:
http://www.wdc.com/en/products/WD360GD.asp

Performs as well or better than many SCSI drives, and are not terribly
expensive.  Therefore, these are a very good choice if price performance
is more important than absolute performance.

But if you need absolute horsepower, then one of these (or other 15K
Ultra320 equivalent) won't be beaten:
http://www.storagereview.com/articles/200304/200304068C073x0_1.html


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [GENERAL] SCSI vs. IDE performance test

2003-10-24 Thread Dann Corbit
Unwrap this link (if your newsreader folds it) and click on it for hard
drive performance:
http://www.storagereview.com/php/benchmark/compare_rtg_2001.php?typeID=1
0testbedID=3osID=4raidconfigID=1numDrives=1devID_0=232devID_1=237
devID_2=213devID_3=221devID_4=216devID_5=249devID_6=250devCnt=7

The important part for database is Server Suite

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [GENERAL] SCSI vs. IDE performance test

2003-10-24 Thread Dennis Gearon
Dann Corbit wrote:

Unwrap this link (if your newsreader folds it) and click on it for hard
drive performance:
http://www.storagereview.com/php/benchmark/compare_rtg_2001.php?typeID=1
0testbedID=3osID=4raidconfigID=1numDrives=1devID_0=232devID_1=237
devID_2=213devID_3=221devID_4=216devID_5=249devID_6=250devCnt=7
The important part for database is Server Suite

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match
 

Fairly old data, but it shows AMAZING differences in head seek time. I 
didn't know head seeks were below 8ms for anything, even today. Also, 
from what I've read, the SATA drives of those days were non existent? 
The earliest SATA drives I've read about were just SATA interfaces on 
OLDER IDE hardware - the manufacutrers had not really signed up on the 
concept enough to put their good hardware underneath the interface.

--
You are behaving like a man,
is an insult from some women,
a compliment from an good woman.


---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org