Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-16 Thread mudfoot
Yeah those big disks arrays are real sweet.

One day last week I was in a data center in Arizona when the big LSI/Storagetek
array in the cage next to mine had a hard drive failure.  So the alarm shrieked
at like 13225535 decibles continuously for hours.  BEEEP BP BP BP. 
Of course since this was a colo facility it wasn't staffed on site by the idiots
who own the array.  BP BEEEP BEEEP for hours.  So I had to stand
next to this thing--only separated by a few feet and a little wire mesh--while
it shrieked for hours until a knuckle-dragger arrived on site to swap the drive.

Yay.

So if you're going to get a fancy array (they're worth it if somebody else is
paying) then make sure to *turn off the @[EMAIL PROTECTED]'ing audible alarm* 
if you deploy
it in a colo facility.

Quoting Scott Marlowe <[EMAIL PROTECTED]>:

> On Wed, 2005-11-16 at 12:51, Steinar H. Gunderson wrote:
> > On Wed, Nov 16, 2005 at 11:06:25AM -0600, Scott Marlowe wrote:
> > > There was a big commercial EMC style array in the hosting center at the
> > > same place that had something like a 16 wide by 16 tall array of IDE
> > > drives for storing pdf / tiff stuff on it, and we had at least one
> > > failure a month in it.  Of course, that's 256 drives, so you're gonna
> > > have failures, and it was configured with a spare on every other row or
> > > some such.  We just had a big box of hard drives and it was smart
> enough
> > > to rebuild automagically when you put a new one in, so the maintenance
> > > wasn't really that bad.  The performance was quite impressive too.
> > 
> > If you have a cool SAN, it alerts you and removes all data off a disk
> > _before_ it starts giving hard failures :-)
> 
> Yeah, I forget who made the unit we used, but it was pretty much fully
> automated.  IT was something like a large RAID 5+0 (0+5???) and would
> send an alert when a drive died or started getting errors, and the bad
> drive's caddy would be flashing read instead of steady green.
> 
> I just remember thinking that I'd never used a drive array that was
> taller than I was before that.
> 
> ---(end of broadcast)---
> TIP 6: explain analyze is your friend
> 



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Is There Any Way ....

2005-09-29 Thread mudfoot
Quoting Lane Van Ingen <[EMAIL PROTECTED]>:

> ... to do the following:
>   (1) Make a table memory-resident only ?

Put it on a RAM filesystem.  On Linux, shmfs.  On *BSD, mfs.  Solaris, tmpfs.

>   (2) Set up user variables in memory that are persistent across all
> sessions, for
>   as long as the database is up and running ?

This sounds like a client thing?  Dunno.

>   (3) Assure that a disk-based table is always in memory (outside of
> keeping
> it in
>   memory buffers as a result of frequent activity which would prevent
> LRU
>   operations from taking it out) ?
> 

Put on RAM fs (like question 1).

Basically, RAM filesystems are on RAM, meaning you need to have enough physical
memory to support them.  And of course their contents completely disappear
between reboots, so you'll need a way to populate them on bootup and make sure
that your updates go to a real nonvolatile storage medium (like disks).  And you
might get swapping on some types of memory filesystems--Solaris' tmpfs is carved
out of virtual memory, which means it will cause swapping if tmpfs contents plus
the rest of your applications exceed physical memory.

> 
> 
> ---(end of broadcast)---
> TIP 6: explain analyze is your friend
> 



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Battery Backed Cache for RAID

2005-09-14 Thread mudfoot
Bear in mind you will lose data if the raid controller itself fails (or the
cache memory module).  Many solutions have mirrored cache for this reason.  But
that's more $$, depending on the risks you want to take.

Quoting Arjen van der Meijden <[EMAIL PROTECTED]>:

> On 14-9-2005 22:03, Alvaro Herrera wrote:
> > On Wed, Sep 14, 2005 at 11:28:43AM -0700, Jeffrey W. Baker wrote:
> > 
> >>On Wed, 2005-09-14 at 11:25 -0700, Peter Darley wrote:
> >>
> >>Actually the cache will just hold its contents while the power is out.
> >>When the power is restored, the RAID controller will complete the writes
> >>to disk.  If the battery does not last through the outage, the data is
> >>lost.
> > 
> > 
> > Just curious: how long are the batteries supposed to last?
> 
> For the LSI-Logic MegaRaid 320-2e its about 72 hours for the standard 
> 128MB version. Their SATA2-solution offers 32 and 72 hour-options. So I 
> assume its "in the order of days" for most RAID controllers.
> 
> Best regards,
> 
> Arjen van der Meijden
> 
> ---(end of broadcast)---
> TIP 6: explain analyze is your friend
> 



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] High load and iowait but no disk access

2005-08-30 Thread mudfoot
This might be optimal behavior from the hardware.  Random reads are hard to
optimize for--except if you have enough physical memory to hold the entire
dataset.  Cached reads (either in array controller or OS buffer cache) should
return nearly immediately.  But random reads probably aren't cached.  And any
read-ahead alogorithms or other types of performance enhancements in the
hardware or OS go out the window--because the behavior isn't predictable.

Each time a drive spindle needs to move to a new track, it requires at least a
couple of miliseconds.  Sequential reads only require this movement
infrequently.  But random reads may be forcing this movement for every IO 
operation.

Since the bottleneck in random reads is the physical hard drives themselves,
everything else stands around waiting.  Fancy hardware can optimize everything
else -- writes with write cache, sequential reads with read-ahead and read
cache.  But there's no real solution to a purely random read workload except
perhaps creating different disk groups to help avoid spindle contention.

I like this tool:  http://www.soliddata.com/products/iotest.html
It allows you to select pure workloads (read/write/sequential/random), and it
runs against raw devices, so you bypass the OS buffer cache.  When I've run it
I've always seen sequential activity get much much higher throughput than 
random.

Quoting Anjan Dave <[EMAIL PROTECTED]>:

> I have seen references of changing the kernel io scheduler at boot time...not
> sure if it applies to RHEL3.0, or will help, but try setting
> 'elevator=deadline' during boot time or via grub.conf. Have you tried running
> a simple 'dd' on the LUN? The drives are in RAID10 configuration, right?
> 
>  
> 
> Thanks,
> 
> Anjan
> 
>   _  
> 
> From: Woody Woodring [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, August 30, 2005 2:30 PM
> To: 'Rémy Beaumont'; pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] High load and iowait but no disk access
> 
>  
> 
> Have you tried a different kernel?  We run with a netapp over NFS without any
> issues, but we have seen high IO-wait on other Dell boxes (running  and not
> running postgres) and RHES 3.  We have replaced a Dell PowerEdge 350 running
> RH 7.3  with a PE750 with more memory running RHES3 and it be bogged down
> with IO waits due to syslog messages writing to the disk, the old slower
> server could handle it fine.  I don't know if it is a Dell thing or a RH
> kernel, but we try different kernels on our boxes to try to find one that
> works better.  We have not found one that stands out over another
> consistently but we have been moving away from Update 2 kernel
> (2.4.21-15.ELsmp) due to server lockup issues.  Unfortunately we get the best
> disk throughput on our few remaining 7.3 boxes.
> 
>  
> 
> Woody
> 
>  
> 
> IGLASS Networks
> 
> www.iglass.net
> 
>  
> 
>   _  
> 
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Rémy Beaumont
> Sent: Monday, August 29, 2005 9:43 AM
> To: pgsql-performance@postgresql.org
> Subject: [PERFORM] High load and iowait but no disk access
> 
> We have been trying to pinpoint what originally seem to be a I/O bottleneck
> but which now seems to be an issue with either Postgresql or RHES 3.
> 
> We have the following test environment on which we can reproduce the
> problem:
> 
> 1) Test System A
> Dell 6650 Quad Xeon Pentium 4
> 8 Gig of RAM
> OS: RHES 3 update 2
> Storage: NetApp FAS270 connected using an FC card using 10 disks
> 
> 2) Test System B
> Dell Dual Xeon Pentium III
> 2 Gig o RAM
> OS: RHES 3 update 2
> Storage: NetApp FAS920 connected using an FC card using 28 disks
> 
> Our Database size is around 30G. 
> 
> The behavior we see is that when running queries that do random reads on
> disk, IOWAIT goes over 80% and actual disk IO falls to a crawl at a
> throughput bellow 3000kB/s (We usually average 4 kB/s to 8 kB/s on
> sequential read operations on the netapps)
> 
> The stats of the NetApp do confirm that it is sitting idle. Doing an strace
> on the Postgresql process shows that is it doing seeks and reads.
> 
> So my question is where is this iowait time spent ?
> Is there a way to pinpoint the problem in more details ?
> We are able to reproduce this behavior with Postgresql 7.4.8 and 8.0.3
> 
> I have included the output of top,vmstat,strace and systat from the Netapp
> from System B while running a single query that generates this behavior.
> 
> Rémy
> 
> top output:
> 06:27:28 up 5 days, 16:59, 6 users, load average: 1.04, 1.30, 1.01
> 72 processes: 71 sleeping, 1 running, 0 zombie, 0 stopped
> CPU states: cpu user nice system irq softirq iowait idle
> total 2.7% 0.0% 1.0% 0.1% 0.2% 46.0% 49.5%
> cpu00 0.2% 0.0% 0.2% 0.0% 0.2% 2.2% 97.2%
> cpu01 5.3% 0.0% 1.9% 0.3% 0.3% 89.8% 1.9%
> Mem: 2061696k av, 2043936k used, 17760k free, 0k shrd, 3916k buff
> 1566332k actv, 296648k in_d, 30504k in_c
> Swap: 16771584k av, 21552k used, 16750032k free 1933772k cached
> 
> PID USER PRI NI S

Re: [PERFORM] Performance problem using V3 protocol in jdbc driver

2005-08-17 Thread mudfoot
Quoting Barry Lind <[EMAIL PROTECTED]>:


>  
> 
> What I see when running the V3 protocol under 'top' is that the postgres
> processes are routinely using 15% or more of the CPU each, when running
> the V2 protocol they use more like 0.3%.
> 
>  
> 
> Does anyone have any suggestions on an approach to debug a problem like
> this?
> 
>  

Tracing system calls is a good starting point--truss on Solaris, strace on Linux
(Redhat anyway), ktrace on BSD.  The difference between 0.3% and 15% CPU
utilization under similar load will very likely (though not with complete
certainty) be showing very noticeably different system call activity.

If you notice a difference in system call activity, then that would probably
provide a hint as to what's going on--where the inefficiency lies.  It's
possible to spin the CPU up without any system calls, but system call tracing
can be done pretty quickly and you should be able to see any interesting
patterns emerge quite quickly.

^
|

This method is a good starting point for troubleshooting just about any funny
process activity.  And it comes with the added benefit of not having to know
ahead of time about the specific matter at hand (JDBC implementation, in this
case).  :-)  That's having your cake and eating it, too.

> 
> Thanks,
> 
> --Barry
> 
> 



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] choosing RAID level for xlogs

2005-08-16 Thread mudfoot
Quoting Anjan Dave <[EMAIL PROTECTED]>:

> Hi,
> 
>  
> 
> One simple question. For 125 or more checkpoint segments
> (checkpoint_timeout is 600 seconds, shared_buffers are at 21760 or
> 170MB) on a very busy database, what is more suitable, a separate 6 disk
> RAID5 volume, or a RAID10 volume? Databases will be on separate
> spindles. Disks are 36GB 15KRPM, 2Gb Fiber Channel. Performance is
> paramount, but I don't want to use RAID0.
> 

RAID10 -- no question.  xlog activity is overwhelmingly sequential 8KB writes. 
In order for RAID5 to perform a write, the host (or controller) needs to perform
extra calculations for parity.  This turns into latency.  RAID10 does not
perform those extra calculations.

>  
> 
> PG7.4.7 on RHAS 4.0
> 
>  
> 
> I can provide more info if needed.
> 
>  
> 
> Appreciate some recommendations!
> 
>  
> 
> Thanks,
> 
> Anjan
> 
>  
> 
>  
> ---
> This email message and any included attachments constitute confidential
> and privileged information intended exclusively for the listed
> addressee(s). If you are not the intended recipient, please notify
> Vantage by immediately telephoning 215-579-8390, extension 1158. In
> addition, please reply to this message confirming your receipt of the
> same in error. A copy of your email reply can also be sent to
> [EMAIL PROTECTED] Please do not disclose, copy, distribute or take
> any action in reliance on the contents of this information. Kindly
> destroy all copies of this message and any attachments. Any other use of
> this email is prohibited. Thank you for your cooperation. For more
> information about Vantage, please visit our website at
> http://www.vantage.com  .
> ---
> 
>  
> 
> 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] JFS fastest filesystem for PostgreSQL?

2005-07-14 Thread mudfoot


Quoting "Jeffrey W. Baker" <[EMAIL PROTECTED]>:


> 
> Here's the result, in transactions per second.
> 
>   ext3  jfs  xfs
> --

---
>  10 Clients 55   81   68
> 100 Clients 61  100   64
> 

Was fsync true?  And have you tried ext2?  Legend has it that ext2 is the
fastest thing going for synchronous writes (besides O_DIRECT or raw) because
there's no journal.

> 
> -jwb
> 
> ---(end of broadcast)---
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>choose an index scan if your joining column's datatypes do not
>match
> 



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Multiple disks: RAID 5 or PG Cluster

2005-06-17 Thread mudfoot
If you truly do not care about data protection -- either from drive loss or from
sudden power failure, or anything else -- and just want to get the fastest
possible performance, then do RAID 0 (striping).  It may be faster to do that
with software RAID on the host than with a special RAID controller.  And turn
off fsyncing the write ahead log in postgresql.conf (fsync = false).

But be prepared to replace your whole database from scratch (or backup or
whatever) if you lose a single hard drive.  And if you have a sudden power loss
or other type of unclean system shutdown (kernel panic or something) then your
data integrity will be at risk as well.

To squeeze evena little bit more performance, put your operating system, swap
and PostgreSQL binaries on a cheap IDE or SATA drive--and only your data on the
5 striped SCSI drives.

I do not know what clustering would do for you.  But striping will provide a
high level of assurance that each of your hard drives will process equivalent
amounts of IO operations.

Quoting Yves Vindevogel <[EMAIL PROTECTED]>:

> Hi,
> 
> We are looking to build a new machine for a big PG database.
> We were wondering if a machine with 5 scsi-disks would perform better 
> if we use a hardware raid 5 controller or if we would go for the 
> clustering in PG.
> If we cluster in PG, do we have redundancy on the data like in a RAID 5 
> ?
> 
> First concern is performance, not redundancy (we can do that a 
> different way because all data comes from upload files)
> 
> Met vriendelijke groeten,
> Bien à vous,
> Kind regards,
> 
> Yves Vindevogel
> Implements
> 
> 



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Adaptec/LSI/?? RAID (what about JBOD?)

2005-06-02 Thread mudfoot
Thanks, Andrew.  I expect to choose between HBAs with no RAID functionality or
with the option to completely bypass RAID functionality--meaning that I'll
hopefully avoid the situation that you've described.  I'm mostly curious as to
whether the driver problems described for U320 Adaptec RAID controllers also
apply to the regular SCSI drivers.

Thanks.

Quoting Andrew McMillan <[EMAIL PROTECTED]>:

> On Thu, 2005-06-02 at 14:02 -0700, [EMAIL PROTECTED] wrote:
> > I have a similar question about what to choose (either LSI or Adaptec
> U320), but
> > plan to use them just for JBOD drivers.  I expect to be using either net
> or
> > freebsd.  The system CPU will be Opteron.  My impression is that both the
> ahd
> > and mpt drivers (for U320 Adaptec and LSI, respectively) are quite stable,
> but
> > not from personal experience.  Like I said, I don't plan to have the cards
> doing
> > RAID in hardware.  Should I be pretty safe with either choice of HBA
> then?
> 
> On the machine I mentioned earlier in this thread we use the Megaraid
> for JBOD, but the card setup to use the disks that way was somewhat
> confusing, requiring us to configure logical drives that in fact matched
> the physical ones.  The card still wanted to write that information onto
> the disks, reducing the total disk space available by some amount, but
> also meaning that we were unable to migrate our system from a previous
> non-RAID card cleanly.
> 
> Regards,
>   Andrew.
> 
> -
> Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053, Manners St,  Wellington
> WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St
> DDI: +64(4)803-2201  MOB: +64(272)DEBIAN  OFFICE: +64(4)499-2267
> Whereof one cannot speak, thereon one must remain silent. -- Wittgenstein
> -
> 
> 



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Adaptec/LSI/?? RAID (what about JBOD?)

2005-06-02 Thread mudfoot
I have a similar question about what to choose (either LSI or Adaptec U320), but
plan to use them just for JBOD drivers.  I expect to be using either net or
freebsd.  The system CPU will be Opteron.  My impression is that both the ahd
and mpt drivers (for U320 Adaptec and LSI, respectively) are quite stable, but
not from personal experience.  Like I said, I don't plan to have the cards doing
RAID in hardware.  Should I be pretty safe with either choice of HBA then?

Thanks (and sorry for the semi-hijack).


Quoting Vivek Khera <[EMAIL PROTECTED]>:

> 
> On Jun 1, 2005, at 11:42 PM, Stacy White wrote:
> 
> > "we find LSI does not work well with 4GB of RAM. Our engineering  
> > find that
> > LSI card could cause system crashes. One of our customer ... has  
> > found that
> > Adaptec cards works well on PostGres SQL -- they're using it as a  
> > preforce
> > server with xfs and post-gress."
> >
> > Any comments?  Suggestions for other RAID controllers?
> >
> 
> I have twin dual opteron, 4GB RAM, LSI MegaRAID-2X cards with 8 disks  
> ([EMAIL PROTECTED] system+pg_xlog, [EMAIL PROTECTED] data) running FreeBSD 
> 5.4-RELEASE.
> 
> Works just perfectly fine under some very heavy insert/update/delete  
> load.  Database + indexes hovers at about 50Gb.
> 
> I don't use the adaptec controllers because they don't support  
> FreeBSD well (and vice versa) and the management tools are not there  
> for FreeBSD in a supported fashion like they are for LSI.
> 
> 
> Vivek Khera, Ph.D.
> +1-301-869-4449 x806
> 
> 
> 



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] PostgreSQL vs. Oracle vs. Microsoft

2005-01-09 Thread mudfoot
Quoting Randolf Richardson <[EMAIL PROTECTED]>:

>   I'm looking for recent performance statistics on PostgreSQL vs. Oracle
> 
> vs. Microsoft SQL Server.  Recently someone has been trying to convince my 

I don't know anything about your customer's requirements other than that they
have a DB currently and somebody(ies) is(are) trying to get them to switch to
another.

I don't think you'll find meaningful numbers unless you do your own benchmarks.

 DB performance is very largely determined by how the application functions,
hardware, OS and the DBA's familiarity with the platform.  I would suspect that
for any given workload on relatively similar hardware that just about any of the
DB's you mention would perform similarly if tuned appropriately.

> client to switch from SyBASE to Microsoft SQL Server (they originally wanted
> 
> to go with Oracle but have since fallen in love with Microsoft).  All this 
> time I've been recommending PostgreSQL for cost and stability (my own testing
> 
> has shown it to be better at handling abnormal shutdowns and using fewer 
> system resources) in addition to true cross-platform compatibility.

Right for the customer?  How about "Don't fix it if it ain't broke"?  Replacing
a DB backend isn't always trivial (understatement).  I suppose if their
application is very simple and uses few if any proprietary features of Sybase
then changing the DB would be simple.  That depends heavily on the application.
In general, though, you probably shouldn't rip and replace DB platforms unless
there's a very good strategic reason.

I don't know about MSSQL, but I know that, if managed properly, Sybase and
Oracle can be pretty rock-solid and high performing.  If *you* have found FooDB
to be the most stable and highest performing, then that probably means that
FooDB is the one you're most familiar with rather than FooDB being the best in
all circumstances.  PostgreSQL is great.  I love it.  In the right hands and
under the right circumstances, it is the best DB.  So is Sybase.  And Oracle. 
And MSSQL.

> 
>   If I can show my client some statistics that PostgreSQL outperforms 
> these (I'm more concerned about it beating Oracle because I know that 
> Microsoft's stuff is always slower, but I need the information anyway to 
> protect my client from falling victim to a 'sales job'), then PostgreSQL will
> 
> be the solution of choice as the client has always believed that they need a
> 
> high-performance solution.
> 

Unless there's a really compelling reason to switch, optimizing what they
already have is probably the best thing for them.  They've already paid for it.
 They've already written their own application and have some familiarity with
managing the DB.  According to Sybase, Sybase is the fastest thing going. :-)
Which is probably pretty close to the truth if the application and DB are tuned
appropriately.

>   I've already convinced them on the usual price, cross-platform 
> compatibility, open source, long history, etc. points, and I've been assured
> 
> that if the performance is the same or better than Oracle's and Microsoft's
> 
> solutions that PostgreSQL is what they'll choose.

Are you telling me that they're willing to pay $40K per CPU for Oracle if it
performs 1% better than PostgreSQL, which is $0?  Not to mention throw away
Sybase, which is a highly scalable platform in and of itself.

The best DB platform is what they currently have, regardless of what they have,
unless there is a very compelling reason to switch.

> 
>   Thanks in advance.
> 
> ---(end of broadcast)---
> TIP 8: explain analyze is your friend
> 



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Tryint to match Solaris-Oracle performance with directio?

2004-09-17 Thread mudfoot
I fully agree with Gaetano about testing sync methods.  From testing I've done
on two different Solaris 8 boxes, the O_DSYNC option on Solaris 8 beats fsync
and fdatasync easily.  Test it yourself though.  There's probably some
opportuntiy there for better performance for you.

> > BTW this is Postgres 7.4.1, and our customers are Solaris 8 and 9.
> 
> If you care your data upgrade to more recent 7.4.5
> 
> Test your better sync method using /src/tools/fsync  however do some
> experiment changing the sync method, you can also avoid to update the
> acces time for the inodes mounting the partition with noatime option
> ( this however have more impact on performance for read activities )
> 
> 
> Regards
> Gaetano Mendola
> 
> 
> 
> ---(end of broadcast)---
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
> 



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Article about PostgreSQL and RAID in Brazil

2004-09-16 Thread mudfoot
Quoting Josh Berkus <[EMAIL PROTECTED]>:

> The first and most important step for RAID performance with PostgreSQL is to
> 
> get a card with onboard battery back-up and enable the write cache for the 
> card.   You do not want to enable the write cache *without* battery back-up
> 

I'm curious about this -- how do you avoid losing data if a cache stick dies? 
Without redundancy, whatever hasn't been destaged to the physical media vanishes
 Dual-controller external arrays (HDS, EMC, LSI, etc.) tend to mirror (though
algorithms vary) the cache in addition to battery backup.  But do onboard arrays
tend to do this as well?

Mark

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Determine optimal fdatasync/fsync, O_SYNC/O_DSYNC options

2004-09-13 Thread mudfoot
Quoting Bruce Momjian <[EMAIL PROTECTED]>:

> 
> Have you seen /src/tools/fsync?
> 

I have now.  Thanks.

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[PERFORM] Determine optimal fdatasync/fsync, O_SYNC/O_DSYNC options

2004-09-12 Thread mudfoot
Hi, I'd like to help with the topic in the Subject: line.  It seems to be a
TODO item.  I've reviewed some threads discussing the matter, so I hope I've
acquired enough history concerning it.  I've taken an initial swipe at
figuring out how to optimize sync'ing methods.  It's based largely on   
recommendations I've read on previous threads about fsync/O_SYNC and so on.
After reviewing, if anybody has recommendations on how to proceed then I'd   
love to hear them.  
   
Attached is a little program that basically does a bunch of sequential writes   
to a file.  All of the sync'ing methods supported by PostgreSQL WAL can be   
used.  Results are printed in microseconds.  Size and quanity of writes are   
configurable.  The documentation is in the code (how to configure, build, run,   
etc.).  I realize that this program doesn't reflect all of the possible   
activities of a production database system, but I hope it's a step in the   
right direction for this task.  I've used it to see differences in behavior   
between the various sync'ing methods on various platforms.   
   
Here's what I've found running the benchmark on some systems to which  
I have access.  The differences in behavior between platforms is quite vast.   
   
Summary first...   
   
   
PostgreSQL should be run on an old Apple MacIntosh attached to   
its own Hitachi disk array with 2GB cache or so.  Use any sync method   
except for fsync().   
   
  
Anyway, there is *a lot* of variance in file synching behavior across   
different hardware and O/S platforms.  It's probably not safe   
to conclude much.  That said, here are some findings so far based on   
tests I've run:  
  
1.  under no circumstances do fsync() or fdatasync() seem to perform   
better than opening files with O_SYNC or O_DSYNC   
2.  where there are differences, opening files with O_SYNC or O_DSYNC   
tends to be quite faster.  
3.  fsync() seems to be the slowest where there are differences.  And   
O_DSYNC seems to be the fastest where results differ.   
4.  the safest thing to assert at this point is that   
Solaris systems ought to use the O_DSYNC method for WAL.   
  
---   
   
Test system(s)   
   
Athlon Linux:   
AMD Athlon XP2000, 512MB RAM, single (54 or 7200?) RPM 20GB IDE disk,   
reiserfs filesystem (3 something I think)   
SuSE Linux kernel 2.4.21-99   
   
Mac Linux:   
I don't know the specific model.  400MHz G3, 512MB, single IDE disk,   
ext2 filesystem   
Debian GNU/Linux 2.4.16-powerpc   
   
HP Intel Linux:   
Prolient HPDL380G3, 2 x 3GHz Xeon, 2GB RAM, SmartArray 5i 64MB cache,   
2 x 15,000RPM 36GB U320 SCSI drives mirrored.  I'm not sure if   
writes are cached or not.  There's no battery backup.   
ext3 filesystem.   
Redhat Enterprise Linux 3.0 kernel based on 2.4.21   
   
Dell Intel OpenBSD:   
Poweredge ?, single 1GHz PIII, 128MB RAM, single 7200RPM 80GB IDE disk,   
ffs filesystem   
OpenBSD 3.2 GENERIC kernel   
   
SUN Ultra2:   
Ultra2, 2 x 296MHz UltraSPARC II, 2GB RAM, 2 x 10,000RPM 18GB U160   
SCSI drives mirrored with Solstice DiskSuite.  UFS filesystem.   
Solaris 8.   
   
SUN E4500 + HDS Thunder 9570v   
E4500, 8 x 400MHz UltraSPARC II, 3GB RAM,   
HDS Thunder 9570v, 2GB mirrored battery-backed cache, RAID5 with a   
bunch of 146GB 10,000RPM FC drives.  LUN is on single 2GB FC fabric   
connection.   
Veritas filesystem (VxFS)   
Solaris 8.   
   
Test methodology:   
   
All test runs were done with CHUNKSIZE 8 * 1024, CHUNKS 2 * 1024,   
FILESIZE_MULTIPLIER 2, and SLEEP 5.  So a total of 16MB was sequentially  
written for each benchmark.  
   
Results are in microseconds.   
   
PLATFORM:   Athlon Linux   
buffered:   48220   
fsync:  74854397   
fdatasync:  75061357   
open_sync:  73869239   
open_datasync:  74748145   
Notes:  System mostly idle.  Even during tests, top showed about 95%   
idle.  Something's not right on this box.  All sync methods similarly   
horrible on this system.   
   
PLATFORM:   Mac Linux   
buffered:   58912   
fsync:  1539079   
fdatasync:  769058   
open_sync:  767094   
open_datasync:  763074   
Notes: system mostly idle.  fsync seems worst.  Otherwise, they seem   
pretty equivalent.  This is the fastest system tested.  
   
PLATFORM:   HP Intel Linux   
buffered:   33026   
fsync:  29330067   
fdatasync:  28673880   
open_sync:  8783417   
open_datasync:  8747971   
Notes: system idle.  O_SYNC and O_DSYNC methods seem to be a lot   
better on this platform than fsync & fdatasync.  
  
PLATFORM:   Dell Intel OpenBSD  
buffered:   511890  
fsync:  1769190  
fdatasync:    
open_sync:  1748764  
open_datasync:  1747433  
Notes: system idle.  I couldn't locate fdatasync() on this box, so I  
couldn't test it.  All sync methods seem equivalent and are very fast --  
though still trail the old Mac.  
  
PLATFORM:   SUN Ultra2  
buffered:   1814824  
fsync: