Re: [PERFORM] Postgresql in a Virtual Machine

2013-11-25 Thread David Lang

On Tue, 26 Nov 2013, Xenofon Papadopoulos wrote:


We have been running several Postgres databases on VMs for the last 9
months. The largest one currently has a few hundreds of millions of rows
(~1.5T of data, ~100G of frequently queried data ) and performs at ~1000
tps. Most of our transactions are part of a 2PC, which effectively results
to high I/O as asynchronous commit is disabled.

Main benefits so far:

- ESXi HA makes high availability completely transparent and reduces the
number of failover servers (we're running N+1 clusters)

- Our projects' load can often miss our expectations, and it changes over
the time. Scaling up/down has helped us cope.


how do you add another server without having to do a massive data copy in the 
process?


David Lang


- Live relocation of databases helps with hardware upgrades and spreading
of load.

Main issues:

- We are not overprovisioning at all (using virtualization exclusively for
the management benefits), so we don't know its impact to performance.

- I/O has often been a bottleneck. We are not certain whether this is due
to the impact of virtualization or due to mistakes in our sizing and
configuration. So far we have been coping by spreading the load across
more spindles and by increasing the memory.





On Tue, Nov 26, 2013 at 1:26 AM, Merlin Moncure  wrote:


On Mon, Nov 25, 2013 at 4:57 PM, David Lang  wrote:

On Mon, 25 Nov 2013, Merlin Moncure wrote:


On Mon, Nov 25, 2013 at 2:01 PM, Lee Nguyen 

wrote:


Hi,

Having attended a few PGCons, I've always heard the remark from a few
presenters and attendees that Postgres shouldn't be run inside a VM.

That

bare metal is the only way to go.

Here at work we were entertaining the idea of running our Postgres
database
on our VM farm alongside our application vm's.  We are planning to run

a

few
Postgres synchronous replication nodes.

Why shouldn't we run Postgres in a VM?  What are the downsides? Does
anyone
have any metrics or benchmarks with the latest Postgres?



Unfortunately (and it really pains me to say this) we live in an
increasingly virtualized world and we just have to go ahead and deal
with it.  I work at a mid cap company and we have a zero tolerance
policy in terms of applications targeting hardware: in short, you
can't.  VMs have downsides: you get less performance per buck and have
another thing to fail but the administration advantages are compelling
especially for large environments.  Furthermore, for any size company
it makes less sense to run your own data center with each passing day;
the cloud providers are really bringing up their game. This is
economic specialization at work.



being pedantic, you can get almost all the management benefits on bare
metal, and you can rent bare metal from hosting providors, cloud VMs are

not

the only option. 'Cloud' makes sense if you have a very predictably spiky
load and you can add/remove machines to meet that load, but if you end up
needing to have the machines running a significant percentage of the

time,

dedicated boxes are cheaper (as well as faster)


Well, that depends on how you define 'most'.  The thing is for me is
that for machines around the office (just like with people) about 10%
of them do 90% of the work.  Being able to slide them around based on
that (sometime changing) need is a tremendous time and cost saver.
For application and infrastructure development dealing with hardware
is just a distraction.   I'd rather click on some interface and say,
'this application needs 25k iops guaranteed' and then make a cost
driven decision on software optimization.  It's hard to let go after
decades of hardware innovation (the SSD revolution was the final shoe
to drop) but for me the time has finally come.  As recently as a year
ago I was arguing databases needed to be run against metal.

merlin


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance






--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgresql in a Virtual Machine

2013-11-25 Thread David Lang

On Mon, 25 Nov 2013, Merlin Moncure wrote:


On Mon, Nov 25, 2013 at 2:01 PM, Lee Nguyen  wrote:

Hi,

Having attended a few PGCons, I've always heard the remark from a few
presenters and attendees that Postgres shouldn't be run inside a VM. That
bare metal is the only way to go.

Here at work we were entertaining the idea of running our Postgres database
on our VM farm alongside our application vm's.  We are planning to run a few
Postgres synchronous replication nodes.

Why shouldn't we run Postgres in a VM?  What are the downsides? Does anyone
have any metrics or benchmarks with the latest Postgres?


Unfortunately (and it really pains me to say this) we live in an
increasingly virtualized world and we just have to go ahead and deal
with it.  I work at a mid cap company and we have a zero tolerance
policy in terms of applications targeting hardware: in short, you
can't.  VMs have downsides: you get less performance per buck and have
another thing to fail but the administration advantages are compelling
especially for large environments.  Furthermore, for any size company
it makes less sense to run your own data center with each passing day;
the cloud providers are really bringing up their game. This is
economic specialization at work.


being pedantic, you can get almost all the management benefits on bare metal, 
and you can rent bare metal from hosting providors, cloud VMs are not the only 
option. 'Cloud' makes sense if you have a very predictably spiky load and you 
can add/remove machines to meet that load, but if you end up needing to have the 
machines running a significant percentage of the time, dedicated boxes are 
cheaper (as well as faster)


David Lang


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgresql Performance on an HP DL385 and

2006-08-15 Thread David Lang

On Tue, 15 Aug 2006 [EMAIL PROTECTED] wrote:

This is also wrong. fsck is needed because the file system is broken.

nope, the file system *may* be broken. the dirty flag simply indicates
that the filesystem needs to be checked to find out whether or not it is
broken.


Ah, but if we knew it wasn't broken, then fsck wouldn't be needed, now
would it? So we assume that it is broken. A little bit of a game, but
it is important to me. If I assumed the file system was not broken, I
wouldn't run fsck. I run fsck, because I assume it may be broken. If
broken, it indicates potential corruption.


note tha the ext3, reiserfs, jfs, and xfs developers (at least) consider 
fsck nessasary even for journaling fileysstems. they just let you get away 
without it being mandatory after a unclean shutdown.


David Lang

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Hardware upgraded but performance still ain't good

2006-08-09 Thread David Lang

On Wed, 9 Aug 2006, Joshua D. Drake wrote:

Ahh and which companies would these be? As a representative of the most 
prominent one in the  US I can tell you that you are not speaking from a 
knowledgeable position.


note I said many, not all. I am aware that your company does not fall into 
this catagory.


I know, but I am curious as to *what* companies. Any reputable PostgreSQL 
company is going to support Linux as a whole except maybe some fringe distros 
like Gentoo or RedFlag. Not to mention FreeBSD and Solaris.


I'm not going to name names in public, but I will point out that different 
companies definitions of what constatutes 'fringe distros' are different. 
For some any linux other then RedHat Enterprise or SuSE is a fringe distro 
(with SuSE being a relativly recent addition, for a while RedHat were 
frequently the only supported distro versions)


and please note, when I'm talking about support, it's not just postgresql 
support, but also hardware/driver support that can run into these problems


David Lang

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Hardware upgraded but performance still ain't good

2006-08-09 Thread David Lang

On Wed, 9 Aug 2006, Joshua D. Drake wrote:

even many of the companies that offer support for postgres have this 
problem. the explination is always that they can't test every distro out 
there so they pick a few and support those (this is one of the reasons why 


Ahh and which companies would these be? As a representative of the most 
prominent one in the  US I can tell you that you are not speaking from a 
knowledgeable position.


note I said many, not all. I am aware that your company does not fall into 
this catagory.


David Lang

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Hardware upgraded but performance still ain't good

2006-08-09 Thread David Lang

On Wed, 9 Aug 2006, Stephen Frost wrote:


* David Lang ([EMAIL PROTECTED]) wrote:

there's a huge difference between 'works on debian' and 'supported on
debian'. I do use debian extensivly, (along with slackware on my personal
machines), so i am comfortable getting things to work. but 'supported'
means that when you run into a problem you can call for help without being
told 'sorry, switch distros, then call us back'.


Have you ever actually had that happen?  I havn't and I've called
support for a number of different issues for various commercial
software.  In the end it might boil down to some distribution-specific
issue that they're not willing to fix but honestly that's pretty rare.


unfortunantly I have, repeatedly with different products.

if you can manage to get past the first couple of levels of support to 
people who really understand things rather then just useing checklists you 
are more likly to get help, but even there I've run into people who seem 
eager to take the easy way out by assuming that it must be a distro thing 
rather then anything with their product (even in cases where it ended up 
being a simple config thing)


David Lang

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Hardware upgraded but performance still ain't good

2006-08-08 Thread David Lang

On Tue, 8 Aug 2006, Stephen Frost wrote:


* Alex Turner ([EMAIL PROTECTED]) wrote:

First off - very few third party tools support debian.  Debian is a sure
fire way to have an unsupported system.  Use RedHat or SuSe (flame me all
you want, it doesn't make it less true).


Yeah, actually, it does make it less true since, well, it's really not
all that true to begin with.

What you're probably intending to say is that fewer companies say "Works
with Debian!" on their advertising material or list it as "officially
supported".  I've had *very* few problems running commercial apps on
Debian (including things like Oracle and IBM SAN management software).
Generally it's just take the rpms and either install them *using* rpm
(which is available in Debian...) or use alien to convert them to a
tarball and/or deb.


there's a huge difference between 'works on debian' and 'supported on 
debian'. I do use debian extensivly, (along with slackware on my personal 
machines), so i am comfortable getting things to work. but 'supported' 
means that when you run into a problem you can call for help without being 
told 'sorry, switch distros, then call us back'.


even many of the companies that offer support for postgres have this 
problem. the explination is always that they can't test every distro out 
there so they pick a few and support those (this is one of the reasons why 
I am watching ubuntu with great interest, it's debian under the covers, 
but they're starting to get the recognition from the support groups of 
companies)


David Lang


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] import performance

2006-03-13 Thread David Lang

On Tue, 14 Mar 2006, Chris wrote:


The only other thing I can see is the old server is ext2:
/dev/hda4 on / type ext2 (rw,errors=remount-ro)

the new one is ext3:
/dev/hda2 on / type ext3 (rw)


this is actually a fairly significant difference.

with ext3 most of your data actually gets written twice, once to the 
journal and a second time to the spot on the disk it's actually going to 
live.


in addition there are significant differences in how things are arranged 
on disk between the two filesystems, (overridable at mount, but only 
changes future new files). the ext3 layout is supposed to be better for a 
general purpose filesystem, but I've found common cases (lots of files and 
directories) where it's significantly slower, and I think postgres will 
fall into those layouts.


try makeing a xfs filesystem for your postgres data and see what sort of 
performance you get on it.


David Lang

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] x206-x225

2006-03-11 Thread David Lang

On Sat, 11 Mar 2006, Joost Kraaijeveld wrote:


Date: Sat, 11 Mar 2006 09:17:09 +0100
From: Joost Kraaijeveld <[EMAIL PROTECTED]>
To: David Lang <[EMAIL PROTECTED]>
Cc: Richard Huxton , pgsql-performance@postgresql.org
Subject: Re: [PERFORM] x206-x225

On Fri, 2006-03-10 at 23:57 -0800, David Lang wrote:

On Sat, 11 Mar 2006, Joost Kraaijeveld wrote:


On Fri, 2006-03-10 at 13:40 +, Richard Huxton wrote:

Your ATA disk is lying about disk caching being turned off. Assuming
each insert is in a separate transaction, then it's not going to do
10,000 / 6 = 1667 transactions/sec - that's faster than it's rotational
speed.

Could you explain the calculation? Why should the number of transactions
be related to the rotational speed of the disk, without saying anything
about the number of bytes per rotation?


each transaction requires a sync to the disk, a sync requires a real
write (which you then wait for), so you can only do one transaction per
rotation.

Not according to a conversation I had with Western Digital about the
write performance of my own SATA disks. What I understand from their
explanation their disk are limited by the MB/sec and not by the number
of writes/second, e.g. I could write 50 MB/sec *in 1 bit/write* on my
disk. This would suggest that the maximum transactions of my disk
(overhead of OS and PostgreSQL ignored) would be 50MB / (transaction
size in MB) per second. Or am I missing something (what would not
surprise me, as I do not understand the perforance of my system at
all ;-))?


but if you do a 1 bit write, and wait for it to complete, and then do 
another 1 bit write that belongs on disk immediatly after the first one 
(and wait for it to complete) you have to wait until the disk rotates to 
the point that it can make the write before it's really safe on disk.


so you can do one transaction in less then one rotation, but if you do 50 
transactions you must wait at least 49 (and a fraction) roatations.


if the disk cache is turned on then you don't have to wait for this, but 
you also will loose the data if you loose power so it's really not safe.


David Lang

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] x206-x225

2006-03-10 Thread David Lang

On Sat, 11 Mar 2006, Joost Kraaijeveld wrote:


On Fri, 2006-03-10 at 13:40 +, Richard Huxton wrote:

Your ATA disk is lying about disk caching being turned off. Assuming
each insert is in a separate transaction, then it's not going to do
10,000 / 6 = 1667 transactions/sec - that's faster than it's rotational
speed.

Could you explain the calculation? Why should the number of transactions
be related to the rotational speed of the disk, without saying anything
about the number of bytes per rotation?


each transaction requires a sync to the disk, a sync requires a real 
write (which you then wait for), so you can only do one transaction per 
rotation.


David Lang

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread David Lang

On Thu, 16 Feb 2006, Mark Lewis wrote:


On Thu, 2006-02-16 at 17:51 -0500, Greg Stark wrote:

Data types which could probably provide a useful function for f would be
int2, int4, oid, and possibly int8 and text (at least for SQL_ASCII).


How exactly do you imagine doing this for text?

I could see doing it for char(n)/varchar(n) where n<=4 in SQL_ASCII though.



In SQL_ASCII, just take the first 4 characters (or 8, if using a 64-bit
sortKey as elsewhere suggested).  The sorting key doesn't need to be a
one-to-one mapping.


that would violate your second contraint ( f(a)==f(b) iff (a==b) )

if you could drop that constraint (the cost of which would be extra 'real' 
compares within a bucket) then a helper function per datatype could work 
as you are talking.


David Lang

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] [GENERAL] Creation of tsearch2 index is very

2006-01-21 Thread David Lang

On Sat, 21 Jan 2006, Tom Lane wrote:


Ron <[EMAIL PROTECTED]> writes:

At 07:23 PM 1/20/2006, Tom Lane wrote:

Well, we're trying to split an index page that's gotten full into
two index pages, preferably with approximately equal numbers of items in
each new page (this isn't a hard requirement though).



Maybe we are over thinking this.  What happens if we do the obvious
and just make a new page and move the "last" n/2 items on the full
page to the new page?


Search performance will go to hell in a handbasket :-(.  We have to make
at least some effort to split the page in a way that will allow searches
to visit only one of the two child pages rather than both.


does the order of the items within a given page matter? if not this sounds 
like a partial quicksort algorithm would work. you don't need to fully 
sort things, but you do want to make sure that everything on the first 
page is 'less' then everything on the second page so you can skip passes 
that don't cross a page boundry



It's certainly true though that finding the furthest pair is not a
necessary component of that.  It's reasonable if you try to visualize
the problem in 2D or 3D, but I'm not sure that that geometric intuition
holds up in such a high-dimensional space as we have here.


I will say that I'm not understanding the problem well enough to 
understand themulti-dimentional nature of this problem.


David Lang

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] How to handle a large DB and simultaneous accesses?

2006-01-10 Thread David Lang

On Tue, 10 Jan 2006, Charles A. Landemaine wrote:


Hello,

I have to develop a companies search engine (looks like the Yellow
pages). We're using PostgreSQL at the company, and the initial DB is
2GB large, as it
has companies from the entire world, with a fair amount of information.

What reading do you suggest so that we can develop the search engine
core, in order that the result pages show up instantly, no matter the
heavy load and
the DB size. The DB is 2GB but should grow to up to 10GB in 2 years,
and there should be 250,000 unique visitors per month by the end of
the year.

Are there special techniques? Maybe there's a way to sort of cache
search results? We're using PHP5 + phpAccelerator.
Thanks,


frankly that is a small enough chunk of data compared to available memory 
sizes that I think your best bet is to plan to have enough ram that you 
only do disk I/O to write and on boot.


a dual socket Opteron system can hold 16G with 2G memory modules (32G as 
4G modules become readily available over the next couple of years). this 
should be enough to keep your data and indexes in ram at all times. if you 
find that other system processes push the data out of ram consider loading 
the data from disk to a ramfs filesystem, just make sure you don't update 
the ram-only copy (or if you do that you have replication setup to 
replicate from the ram copy to a copy on real disks somewhere). depending 
on your load you could go with single core or dual core chips (and the 
cpu's are a small enough cost compared to this much ram that you may as 
well go with the dual core cpu's)


now even with your data in ram you can slow down if your queries, indexes, 
and other settings are wrong, but if performance is important you should 
be able to essentially eliminate disks for databases of this size.


David Lang

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] help tuning queries on large database

2006-01-06 Thread David Lang

On Fri, 6 Jan 2006, Tom Lane wrote:


Date: Fri, 06 Jan 2006 18:47:55 -0500
From: Tom Lane <[EMAIL PROTECTED]>
To: peter royal <[EMAIL PROTECTED]>
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] help tuning queries on large database

peter royal <[EMAIL PROTECTED]> writes:

So, my question is, is there anything I can do to boost performance
with what I've got, or am I in a position where the only 'fix' is
more faster disks? I can't think of any schema/index changes that
would help, since everything looks pretty optimal from the 'explain
analyze' output. I'd like to get a 10x improvement when querying from
the 'cold' state.


I don't think you have any hope of improving the "cold" state much.
The right way to think about this is not to be in the "cold" state.
Check your kernel parameters and make sure it's not set to limit
the amount of memory used for cache (I'm not actually sure if there
is such a limit on Linux, but there definitely is on some other Unixen).


Linux doesn't have any ability to limit the amount of memory used for 
caching (there are periodicly requests for such a feature)


David Lang


Look around and see if you can reduce the memory used by processes,
or even better, offload non-database tasks to other machines.

Basically you need to get as much of the database as you can to stay
in disk cache.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Help in avoiding a query 'Warm-Up' period/shared buffer

2006-01-05 Thread David Lang

On Thu, 5 Jan 2006, Mark Liberman wrote:


Obviously, I will be testing this - but it might take a few days, as I haven't
figure out how to simulate the "period of inactivity" to get the data flushed
out of the cache ... so I have to run this each morning.


cat large_file >/dev/null

will probably do a pretty good job of this (especially if large_file is 
noticably larger then the amount of ram you have)


David Lang

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] What's the best hardver for PostgreSQL 8.1?

2005-12-26 Thread David Lang

On Mon, 26 Dec 2005, Alex Turner wrote:



Yes, but those blocks in RAID 10 are largely irrelevant as they are to
independant disks.  In RAID 5 you have to write parity to an 'active'
drive that is part of the stripe.  (They are irrelevant unless of
course you are maxing out your SCSI bus - yet another reason why SATA
can be faster than SCSI, particularly in RAID 10, every channel is
independant).


I don't understand your 'active' vs 'inactive' drive argument, in raid 1 
or 1+0 all drives are active.


with good components you need to worry about maxing out your PCI bus as 
much as any other one (this type of thing is where the hardware raid has a 
definante advantage since the card handles the extra I/O, not your system)



Sorry - my math for RAID 5 was a bit off - I don't know why I was
considering only a three dirve situation - which is the worst.  It's
n+1 you are right.  still, for small arrays thats a big penalty.
Still, there is definately a penatly contrary to the assertion of the
orignal poster.

I agree totally that the read+parity-calc+write in the worst case is
totaly bad, which is why I alway recommend people should _never ever_
use RAID 5.   In this day and age of large capacity chassis, and large
capacity SATA drives, RAID 5 is totally inapropriate IMHO for _any_
application least of all databases.

In reality I have yet to benchmark a system where RAID 5 on the same
number of drives with 8 drives or less in a single array beat a RAID
10 with the same number of drives.  I would definately be interested
in a SCSI card that could actualy achieve the theoretical performance
of RAID 5 especially under Linux.


but it's not a 'same number of drives' comparison you should be makeing.

if you have a 8 drive RAID5 array you need to compare it with a 14 drive 
RAID1/10 array.



With RAID 5 you get to watch you system crumble and fail when a drive
fails and the array goes into a failed state.  It's just not worth it.


speed is worth money (and therefor number of drives) in some cases, but 
not in all cases. also the speed penalty when you have a raid drive fail 
varies based on your controller


it's wrong to flatly rule out any RAID configuration, they all have their 
place and the important thing is to understand what the advantages and 
disadvantages are for each of them so you can know when to use each one.


for example I have a situation I am looking at where RAID0 is looking 
appropriate for a database (a multi-TB array that gets completely reloaded 
every month or so as data expires and new data is loaded from the 
authoritative source, adding another 16 drives to get redundancy isn't 
reasonable)


David Lang


Alex.


On 12/26/05, David Lang <[EMAIL PROTECTED]> wrote:

On Mon, 26 Dec 2005, Alex Turner wrote:


It's irrelavent what controller, you still have to actualy write the
parity blocks, which slows down your write speed because you have to
write n+n/2 blocks. instead of just n blocks making the system write
50% more data.

RAID 5 must write 50% more data to disk therefore it will always be slower.


raid5 writes n+1 blocks not n+n/2 (unless n=2 for a 3-disk raid). you can
have a 15+1 disk raid5 array for example

however raid1 (and raid10) have to write 2*n blocks to disk. so if you are
talking about pure I/O needed raid5 wins hands down. (the same 16 drives
would be a 8+8 array)

what slows down raid 5 is that to modify a block you have to read blocks
from all your drives to re-calculate the parity. this interleaving of
reads and writes when all you are logicly doing is writes can really hurt.
(this is why I asked the question that got us off on this tangent, when
doing new writes to an array you don't have to read the blocks as they are
blank, assuming your cacheing is enough so that you can write blocksize*n
before the system starts actually writing the data)

David Lang


Alex.

On 12/25/05, Michael Stone <[EMAIL PROTECTED]> wrote:

On Sat, Dec 24, 2005 at 05:45:20PM -0500, Ron wrote:

Caches help, and the bigger the cache the better, but once you are
doing enough writes fast enough (and that doesn't take much even with
a few GBs of cache) the recalculate-checksums-and-write-new-ones
overhead will decrease the write speed of real data.  Bear in mind
that the HD's _raw_ write speed hasn't been decreased.  Those HD's
are pounding away as fast as they can for you.  Your _effective_ or
_data level_ write speed is what decreases due to overhead.


You're overgeneralizing. Assuming a large cache and a sequential write,
there's need be no penalty for raid 5. (For random writes you may
need to read unrelated blocks in order to calculate parity, but for
large sequential writes the parity blocks should all be read from
cache.) A modern cpu can calculate parity for raid 5 on the order of
gigabytes per second, and even crummy embedded processors can do
hund

Re: [PERFORM] What's the best hardver for PostgreSQL 8.1?

2005-12-26 Thread David Lang

On Mon, 26 Dec 2005, Alex Turner wrote:


It's irrelavent what controller, you still have to actualy write the
parity blocks, which slows down your write speed because you have to
write n+n/2 blocks. instead of just n blocks making the system write
50% more data.

RAID 5 must write 50% more data to disk therefore it will always be slower.


raid5 writes n+1 blocks not n+n/2 (unless n=2 for a 3-disk raid). you can 
have a 15+1 disk raid5 array for example


however raid1 (and raid10) have to write 2*n blocks to disk. so if you are 
talking about pure I/O needed raid5 wins hands down. (the same 16 drives 
would be a 8+8 array)


what slows down raid 5 is that to modify a block you have to read blocks 
from all your drives to re-calculate the parity. this interleaving of 
reads and writes when all you are logicly doing is writes can really hurt. 
(this is why I asked the question that got us off on this tangent, when 
doing new writes to an array you don't have to read the blocks as they are 
blank, assuming your cacheing is enough so that you can write blocksize*n 
before the system starts actually writing the data)


David Lang


Alex.

On 12/25/05, Michael Stone <[EMAIL PROTECTED]> wrote:

On Sat, Dec 24, 2005 at 05:45:20PM -0500, Ron wrote:

Caches help, and the bigger the cache the better, but once you are
doing enough writes fast enough (and that doesn't take much even with
a few GBs of cache) the recalculate-checksums-and-write-new-ones
overhead will decrease the write speed of real data.  Bear in mind
that the HD's _raw_ write speed hasn't been decreased.  Those HD's
are pounding away as fast as they can for you.  Your _effective_ or
_data level_ write speed is what decreases due to overhead.


You're overgeneralizing. Assuming a large cache and a sequential write,
there's need be no penalty for raid 5. (For random writes you may
need to read unrelated blocks in order to calculate parity, but for
large sequential writes the parity blocks should all be read from
cache.) A modern cpu can calculate parity for raid 5 on the order of
gigabytes per second, and even crummy embedded processors can do
hundreds of megabytes per second. You may have run into some lousy
implementations, but you should be much more specific about what
hardware you're talking about instead of making sweeping
generalizations.


Side Note: people often forget the other big reason to use RAID 10
over RAID 5.  RAID 5 is always only 2 HD failures from data
loss.  RAID 10 can lose up to 1/2 the HD's in the array w/o data loss
unless you get unlucky and lose both members of a RAID 1 set.


IOW, your RAID 10 is only 2 HD failures from data loss also. If that's
an issue you need to go with RAID 6 or add another disk to each mirror.

Mike Stone

---(end of broadcast)---
TIP 6: explain analyze is your friend



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] What's the best hardver for PostgreSQL 8.1?

2005-12-25 Thread David Lang

On Sat, 24 Dec 2005, Luke Lonergan wrote:


David,


now hot-swap may not be supported on all interface types, that may be what
you have run into, but with SCSI or SATA you should be able to hot-swap
with the right controller.


That's actually the problem - Linux hot swap is virtually non-functional for 
SCSI.  You can write into the proper places in /proc, then remove and rescan to 
get a new drive up, but I've found that the resulting OS state is flaky.  This 
is true of the latest 2.6 kernels and LSI and Adaptec SCSI controllers.

The problems I've seen are  with Linux, not the controllers.


Thanks for the clarification, I knew that PATA didn't do hotswap, and I've 
seen discussions on the linux-kernel list about SATA hotswap being worked 
on, but I thought that scsi handled it. how recent a kernel have you had 
problems with?


David Lang

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] What's the best hardver for PostgreSQL 8.1?

2005-12-24 Thread David Lang

On Sat, 24 Dec 2005, Luke Lonergan wrote:


Recently, I helped a company named DeepData to improve their dbms
performance, which was a combination of moving them to software RAID50
on Linux and getting them onto Bizgres.  The disk subsystem sped up on
the same hardware (minus the HW RAID card) by over a factor of 10.  The
downside is that SW RAID is a pain in the neck for management - you have
to shut down the Linux host when a disk fails to replace it.


Luke, you should not need to shut down the linux host when a disk fails.

you should be able to use mdadm to mark the drive as failed, then remove 
it from the system and replace it, then use mdadm to add the drive to the 
array.


I'm fighting through a double disk failure on my system at home and when I 
hit a bad spot on a drive (failing it from the array) I can just re-add it 
without having to restart everything (if it's the second drive I will have 
to stop and restart the array, but that's becouse the entire array has 
failed at that point)


now hot-swap may not be supported on all interface types, that may be what 
you have run into, but with SCSI or SATA you should be able to hot-swap 
with the right controller.


David Lang

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] What's the best hardver for PostgreSQL 8.1?

2005-12-24 Thread David Lang

On Sat, 24 Dec 2005, Ron wrote:


At 02:50 PM 12/24/2005, Frank Wiles wrote:

Juan Casero <[EMAIL PROTECTED]> wrote:

> Sorry folks.  I had a couple of glasses of wine as I wrote this.
> Anyway I originally wanted the box to have more than two drives so I
> could do RAID 5 but that is going to cost too much.  Also, contrary
> to my statement below it seems to me I should run the 32 bit
> postgresql server on the 64 bit kernel. Would you agree this will
> probably yield the best performance?I know it depends alot on the
> system but for now this database is about 20 gigabytes. Not too large
> right now but it may grow 5x in the next year.

  You definitely DO NOT want to do RAID 5 on a database server.  That
  is probably the worst setup you could have, I've seen it have lower
  performance than just a single hard disk.

  RAID 1 and RAID 1+0 are optimal, but you want to stay far away from
  RAID 5.  IMHO RAID 5 is only useful on near line backup servers or
  Samba file servers where space is more important than speed.
That's a bit misleading.  RAID 5 excels when you want read speed but don't 
care as much about write speed.  Writes are typical ~2/3 the speed of reads 
on a typical decent RAID 5 set up.


So if you have tables that are read often and written to rarely or not at 
all, putting them on RAID 5 is optimal.  In both data mining like and OLTP 
like apps there are usually at least some such tables.


raid 5 is bad for random writes as you state, but how does it do for 
sequential writes (for example data mining where you do a large import at 
one time, but seldom do other updates). I'm assuming a controller with a 
reasonable amount of battery-backed cache.


David Lang

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] What's the best hardver for PostgreSQL 8.1?

2005-12-22 Thread David Lang

On Thu, 22 Dec 2005, Juan Casero wrote:


Ok thanks.  I think I will go with 64 bit everything on the box.  If I can get
the Sun Fire V20Z then I will stick with Solaris 10 x86 and download the 64
bit PostgreSQL 8.1 binaries from blastwave.org.   I develop the PHP code to
my DSS system on my Windows XP laptop.  Normally, I test the code on this
laptop but let it hit the live database when I want to run some tests.  Well
just this afternoon I installed PostgreSQL 8.1.1 on my windows laptop and
rebuilt the the entire live database instance on there from a pg_dump
archive.   I am blown away by the performance increase in PostgreSQL 8.1.x.
Has anyone else had a chance to test it?   All the queries I run against it
are remarkably fast but more importantly I can see that the two cores of my
Hyper Threaded P4 are being used.   One of the questions I posted on this
list was whether PostgreSQL could make use of the large number of cores
available on the Ultrasparc T1000/T2000 cores.  I am beginning to think that
with PostgreSQL 8.1.x the buffer manager could indeed use all those cores.
This could make running a DSS or OLTP on an Ultrasparc T1000/T2000 with
PostgreSQL a much better bargain than on an intel system.  Any thoughts?


if you have enough simultanious transactions, and your I/O systems (disk 
and memory interfaces) can keep up with your needs then postgres can use 
quite a few cores.


there are some limits that will show up with more cores, but I don't think 
it's well known where they are (this will also be very dependant on your 
workload as well). there was the discussion within the last month or two 
that hit the postgres weekly news where more attention is being paied to 
the locking mechanisms used so this is an area under active development 
(note especially that some locking strategies that work well with multiple 
full cores can be crippling with virtual cores (Intel HT etc).


but it boils down to the fact that there just isn't enough experiance with 
the new sun systems to know how well they will work. they could end up 
being fabulous speed demons, or dogs (and it could even be both, depending 
on your workload)


David Lang


Thanks,
Juan

On Thursday 22 December 2005 22:12, David Lang wrote:

On Wed, 21 Dec 2005, Juan Casero wrote:

Date: Wed, 21 Dec 2005 22:31:54 -0500
From: Juan Casero <[EMAIL PROTECTED]>
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] What's the best hardver for PostgreSQL 8.1?

Sorry folks.  I had a couple of glasses of wine as I wrote this.  Anyway
I originally wanted the box to have more than two drives so I could do
RAID 5 but that is going to cost too much.  Also, contrary to my
statement below it seems to me I should run the 32 bit postgresql server
on the 64 bit kernel. Would you agree this will probably yield the best
performance?


you definantly need a 64 bit kernel to address as much ram as you will
need.

the question of 32 bit vs 64 bit postgres needs to be benchmarked, but my
inclination is that you probably do want 64 bit for that as well.

64 bit binaries are slightly larger then 32 bit ones (less so on x86/AMD64
then on any other mixed platform though), but the 64 bit version also has
access to twice as many registers as a 32 bit one, and the Opteron chips
have some other features that become availabel in 64 bit mode (or more
useful)

like everything else this needs benchmarks to prove with your workload
(I'm trying to get some started, but haven't had a chance yet)

David Lang


I know it
depends alot on the system but for now this database is about 20
gigabytes. Not too large right now but it may grow 5x in the next year.

Thanks,
Juan

On Wednesday 21 December 2005 22:09, Juan Casero wrote:

I just sent my boss an email asking him for a Sun v20z with dual 2.2 Ghz
opterons, 2 Gigs of RAM and RAID 1.  I would have liked a better server
capable of RAID but that seems to be out of his budget right now.  Ok so
I assume I get this Sun box.  Most likely I will go with Linux since it
is a fair bet he doesn't want to pay for the Solaris 10 x86 license.
Although I kind of like the idea of using Solaris 10 x86 for this.   I
will assume I need to install the x64 kernel that comes with say Fedora
Core 4.  Should I run the Postgresql 8.x binaries in 32 bit mode or 64
bit mode?   My instinct tells me 64 bit mode is most efficient for our
database size about 20 gigs right now but may grow to 100 gigs in a year
or so.  I just finished loading a 20 gig database on a dual 900 Mhz
Ultrasparc III system with 2 gigs of ram and about 768 megs of shared
memory available for the posgresql server running Solaris 10.  The load
has smoked a P4 3.2 Ghz system I am using also with 2 gigs of ram
running postgresql 8.0.3.   I mean I started the sparc load after the P4
load.  The sparc load has finished already rebuilding the database from
a pg_dump file but the P4 system is still going.  The p4 has 1.3 Gigs of
shared memory

Re: [PERFORM] What's the best hardver for PostgreSQL 8.1?

2005-12-22 Thread David Lang

On Wed, 21 Dec 2005, Juan Casero wrote:


Date: Wed, 21 Dec 2005 22:31:54 -0500
From: Juan Casero <[EMAIL PROTECTED]>
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] What's the best hardver for PostgreSQL 8.1?

Sorry folks.  I had a couple of glasses of wine as I wrote this.  Anyway I
originally wanted the box to have more than two drives so I could do RAID 5
but that is going to cost too much.  Also, contrary to my statement below it
seems to me I should run the 32 bit postgresql server on the 64 bit kernel.
Would you agree this will probably yield the best performance?


you definantly need a 64 bit kernel to address as much ram as you will 
need.


the question of 32 bit vs 64 bit postgres needs to be benchmarked, but my 
inclination is that you probably do want 64 bit for that as well.


64 bit binaries are slightly larger then 32 bit ones (less so on x86/AMD64 
then on any other mixed platform though), but the 64 bit version also has 
access to twice as many registers as a 32 bit one, and the Opteron chips 
have some other features that become availabel in 64 bit mode (or more 
useful)


like everything else this needs benchmarks to prove with your workload 
(I'm trying to get some started, but haven't had a chance yet)


David Lang


I know it
depends alot on the system but for now this database is about 20 gigabytes.
Not too large right now but it may grow 5x in the next year.

Thanks,
Juan

On Wednesday 21 December 2005 22:09, Juan Casero wrote:

I just sent my boss an email asking him for a Sun v20z with dual 2.2 Ghz
opterons, 2 Gigs of RAM and RAID 1.  I would have liked a better server
capable of RAID but that seems to be out of his budget right now.  Ok so I
assume I get this Sun box.  Most likely I will go with Linux since it is a
fair bet he doesn't want to pay for the Solaris 10 x86 license.  Although I
kind of like the idea of using Solaris 10 x86 for this.   I will assume I
need to install the x64 kernel that comes with say Fedora Core 4.  Should I
run the Postgresql 8.x binaries in 32 bit mode or 64 bit mode?   My
instinct tells me 64 bit mode is most efficient for our database size about
20 gigs right now but may grow to 100 gigs in a year or so.  I just
finished loading a 20 gig database on a dual 900 Mhz Ultrasparc III system
with 2 gigs of ram and about 768 megs of shared memory available for the
posgresql server running Solaris 10.  The load has smoked a P4 3.2 Ghz
system I am using also with 2 gigs of ram running postgresql 8.0.3.   I
mean I started the sparc load after the P4 load.  The sparc load has
finished already rebuilding the database from a pg_dump file but the P4
system is still going.  The p4 has 1.3 Gigs of shared memory allocated to
postgresql.  How about them apples?


Thanks,
Juan

On Wednesday 21 December 2005 18:57, William Yu wrote:

Juan Casero wrote:

Can you elaborate on the reasons the opteron is better than the Xeon
when it comes to disk io?   I have a PostgreSQL 7.4.8 box running a
DSS.   One of our


Opterons have 64-bit IOMMU -- Xeons don't. That means in 64-bit mode,
transfers to > 4GB, the OS must allocated the memory < 4GB, DMA to that
block and then the CPU must do extra work in copying the memory to >
4GB. Versus on the Opteron, it's done by the IO adaptor using DMA in the
background.

---(end of broadcast)---
TIP 6: explain analyze is your friend


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] What's the best hardver for PostgreSQL 8.1?

2005-12-20 Thread David Lang

On Tue, 20 Dec 2005, Juan Casero wrote:


Date: Tue, 20 Dec 2005 19:50:47 -0500
From: Juan Casero <[EMAIL PROTECTED]>
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] What's the best hardver for PostgreSQL 8.1?

Can you elaborate on the reasons the opteron is better than the Xeon when it
comes to disk io?


the opteron is cheaper so you have more money to spend on disks :-)

also when you go into multi-cpu systems the front-side-bus design of the 
Xeon's can easily become your system bottleneck so that you can't take 
advantage of all the CPU's becouse they stall waiting for memory accesses, 
Opteron systems have a memory bus per socket so the more CPU's you have 
the more memory bandwidth you have.




The database itself is about 20 gigs
but I want it to scale to 100 gigs.


how large is the working set? in your tests you ran into swapping on your 
1.2G system, buying a dual opteron with 16gigs of ram will allow you to 
work with much larger sets of data, and you can go beyond that if needed.


David Lang

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] PostgreSQL and Ultrasparc T1

2005-12-20 Thread David Lang

On Tue, 20 Dec 2005, Alan Stange wrote:


David Lang wrote:

On Tue, 20 Dec 2005, Alan Stange wrote:


Jignesh K. Shah wrote:

I guess it depends on what you term as your metric for measurement.
If it is just one query execution time .. It may not be the best on 
UltraSPARC T1.
But if you have more than 8 complex queries running simultaneously, 
UltraSPARC T1 can do well compared comparatively provided the application 
can scale also along with it.


I just want to clarify one issue here.   It's my understanding that the 
8-core, 4 hardware thread (known as strands) system is seen as a 32 cpu 
system by Solaris. So, one could have up to 32 postgresql processes 
running in parallel on the current systems (assuming the application can 
scale).


note that like hyperthreading, the strands aren't full processors, their 
efficiancy depends on how much other threads shareing the core stall 
waiting for external things.
Exactly.   Until we have a machine in hand (and substantial technical 
documentation) we won't know all the limitations.


by the way, when you do get your hands on it I would be interested to hear 
how Linux compares to Solaris on the same hardware.


given how new the hardware is it's also likly that linux won't identify 
the hardware properly (either seeing it as 32 true processors or just as 8 
without being able to use the strands), so the intitial tests may not 
reflect the Linux performance in a release or two.


David Lang

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] PostgreSQL and Ultrasparc T1

2005-12-20 Thread David Lang

On Tue, 20 Dec 2005, Alan Stange wrote:


Jignesh K. Shah wrote:

I guess it depends on what you term as your metric for measurement.
If it is just one query execution time .. It may not be the best on 
UltraSPARC T1.
But if you have more than 8 complex queries running simultaneously, 
UltraSPARC T1 can do well compared comparatively provided the application 
can scale also along with it.


I just want to clarify one issue here.   It's my understanding that the 
8-core, 4 hardware thread (known as strands) system is seen as a 32 cpu 
system by Solaris. 
So, one could have up to 32 postgresql processes running in parallel on the 
current systems (assuming the application can scale).


note that like hyperthreading, the strands aren't full processors, their 
efficiancy depends on how much other threads shareing the core stall 
waiting for external things.


David Lang

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] separate drives for WAL or pgdata files

2005-12-19 Thread David Lang

On Mon, 19 Dec 2005, David Lang wrote:

this is getting dangerously close to being able to fit in ram. I saw an 
article over the weekend that Samsung is starting to produce 8G DIMM's, that 
can go 8 to a controller (instead of 4 per as is currently done), when 
motherboards come out that support this you can have 64G of ram per opteron 
socket. it will be pricy, but the performance


a message on another mailing list got me to thinking, there is the horas 
project that is aiming to put togeather 16 socket Opteron systems within a 
year (they claim sooner, but I'm being pessimistic ;-), combine this with 
these 8G dimms and you can have a SINGLE system with 1TB of ram on it 
(right at the limits of the Opteron's 40 bit external memory addressing)


_wow_

and the thing it that it won't take much change in the software stack to 
deal with this.


Linux is already running on machines with 1TB of ram (and 512 CPU's) so it 
will run very well. Postgres probably needs some attention to it's locks, 
but it is getting that attention now (and it will get more with the Sun 
Niagra chips being able to run 8 processes simultaniously)


just think of the possibilities (if you have the money to afford the super 
machine :-)


David Lang


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] separate drives for WAL or pgdata files

2005-12-19 Thread David Lang

On Mon, 19 Dec 2005, Anjan Dave wrote:


I am not sure if there's an obvious answer to this...If there's a choice
of an external RAID10 (Fiber Channel 6 or 8 15Krpm drives) enabled
drives, what is more beneficial to store on it, the WAL, or the Database
files? One of the other would go on the local RAID10 (4 drives, 15Krpm)
along with the OS.


the WAL is small compared to the data, and it's mostly sequential access, 
so it doesn't need many spindles, it just needs them more-or-less 
dedicated to the WAL and not distracted by other things.


the data is large (by comparison), and is accessed randomly, so the more 
spindles that you can throw at it the better.


In your place I would consider making the server's internal drives into 
two raid1 pairs (one for the OS, one for the WAL), and then going with 
raid10 on the external drives for your data



This is a very busy database with high concurrent connections, random
reads and writes. Checkpoint segments are 300 and interval is 6 mins.
Database size is less than 50GB.


this is getting dangerously close to being able to fit in ram. I saw an 
article over the weekend that Samsung is starting to produce 8G DIMM's, 
that can go 8 to a controller (instead of 4 per as is currently done), 
when motherboards come out that support this you can have 64G of ram per 
opteron socket. it will be pricy, but the performance


in the meantime you can already go 4G/slot * 4 slots/socket and get 64G on 
a 4-socket system. it won't be cheap, but the performance will blow away 
any disk-based system.


for persistant storage you can replicate from your ram-based system to a 
disk-based system, and as long as your replication messages hit disk 
quickly you can allow the disk-based version to lag behind in it's updates 
during your peak periods (as long as it is able to catch up with the 
writes overnight), and as the disk-based version won't have to do the 
seeks for the reads it will be considerably faster then if it was doing 
all the work (especially if you have good, large  battery-backed disk 
caches to go with those drives to consolodate the writes)



It has become a bit more confusing because I am trying to allot shared
storage across several hosts, and want to be careful not to overload one
of the 2 storage processors.


there's danger here, if you share spindles with other apps you run the 
risk of slowing down your database significantly. you may be better off 
with fewer, but dedicated drives rather then more, but shared drives.


David Lang


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Overriding the optimizer

2005-12-17 Thread David Lang

On Fri, 16 Dec 2005, Mark Kirkwood wrote:


Craig A. James wrote:



What would be cool would be some way the developer could alter the plan, 
but they way of doing so would strongly encourage the developer to send the 
information to this mailing list.  Postgres would essentially say, "Ok, you 
can do that, but we want to know why!"




Yeah it would - an implementation I have seen that I like is where the 
developer can supply the *entire* execution plan with a query. This is 
complex enough to make casual use unlikely :-), but provides the ability to 
try out other plans, and also fix that vital query that must run today.


hmm, I wonder if this option would have uses beyond the production hacks 
that are being discussed.


specificly developers working on the optimizer (or related things like 
clustered databases) could use the same hooks to develop and modify the 
'optimizer' externally to postgres (doing an explain would let them find 
the costs that postgres thinks each option has, along with it's 
reccomendation, but the developer could try different execution plans 
without having to recompile postgres between runs. and for clustered 
databases where the data is split between machines this would be a hook 
that the cluster engine could use to put it's own plan into place without 
having to modify and recompile)


David Lang

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Overriding the optimizer

2005-12-15 Thread David Lang

On Thu, 15 Dec 2005, Craig A. James wrote:

The example I raised in a previous thread, of irregular usage, is the same: I 
have a particular query that I *always* want to be fast even if it's only 
used rarely, but the system swaps its tables out of the file-system cache, 
based on "low usage", even though the "high usage" queries are low priority. 
How can Postgres know such things when there's no way for me to tell it?


actually, postgres doesn't manage the file-system cache, it deliberatly 
leaves that up to the OS it is running on to do that job.


one (extremely ugly) method that you could use would be to have a program 
that looks up what files are used to store your high priority tables and 
then write a trivial program to keep those files in memory (it may be as 
simple as mmaping the files and then going to sleep, or you may have to 
read various points through the file to keep them current in the cache, it 
WILL vary depending on your OS and filesystem in use)


oracle goes to extremes with this sort of control, I'm actually mildly 
surprised that they still run on a host OS and haven't completely taken 
over the machine (I guess they don't want to have to write device drivers, 
that's about the only OS code they really want to use, they do their own 
memory management, filesystem, and user systems), by avoiding areas like 
this postgres sacrafices a bit of performance, but gains a much broader 
set of platforms (hardware and OS) that it can run on. and this by itself 
can result in significant wins (does oracle support Opteron CPU's in 64 
bit mode yet? as of this summer it just wasn't an option)


David Lang

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Simple Join

2005-12-15 Thread David Lang

On Fri, 16 Dec 2005, Mark Kirkwood wrote:



Right on. Some of these "coerced" plans may perform much better. If so, we 
can look at tweaking your runtime config: e.g.


effective_cache_size
random_page_cost
default_statistics_target

to see if said plans can be chosen "naturally".


Mark, I've seen these config options listed as tweaking targets fairly 
frequently, has anyone put any thought or effort into creating a test 
program that could analyse the actual system and set the defaults based on 
the measured performance?


David Lang

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Overriding the optimizer

2005-12-15 Thread David Lang

On Thu, 15 Dec 2005, Craig A. James wrote:


Mark Kirkwood wrote:
I hear what you are saying, but to use this fine example - I don't know 
what the best plan is - these experiments part of an investigation to find 
*if* there is a better plan, and if so, why Postgres is not finding it.


There isn't a database in the world that is as smart as a developer, or 
that can have insight into things that only a developer can possibly know.


That is often true - but the aim is to get Postgres's optimizer closer to 
developer smartness.


What would be cool would be some way the developer could alter the plan, but 
they way of doing so would strongly encourage the developer to send the 
information to this mailing list.  Postgres would essentially say, "Ok, you 
can do that, but we want to know why!"


at the risk of sounding flippent (which is NOT what I intend) I will point 
out that with the source you can change the optimizer any way you need to 
:-)


that being said, in your example the issue is the cost of the user created 
function and the fact that postgres doesn't know it's cost.


would a resonable answer be to give postgres a way to learn how expensive 
the call is?


a couple ways I could see to do this.

1. store some stats automagicly when the function is called and update the 
optimization plan when you do an ANALYSE


2. provide a way for a user to explicitly set a cost factor for a function 
(with a default value that's sane for fairly trivial functions so that it 
would only have to be set for unuseually expensive functions)


now, neither of these will work all the time if a given function is 
sometimes cheap and sometimes expensive (depending on it's parameters), 
but in that case I would say that if the application knows that a function 
will be unusueally expensive under some conditions (and knows what those 
conditions will be) it may be a reasonable answer to duplicate the 
function, one copy that it uses most of the time, and a second copy that 
it uses when it expects it to be expensive. at this point the cost of the 
function can be set via either of the methods listed above)


After years of using several other database products (some supporting hint 
type constructs and some not), I have come to believe that hinting (or 
similar) actually *hinders* the development of a great optimizer.


I agree.  It takes the pressure off the optimizer gurus.  If the users can 
just work around every problem, then the optimizer can suck and the system is 
still usable.


Lest anyone think I'm an all-out advocate of overriding the optimizer, I know 
from first-hand experience what a catastrophe it can be.  An Oracle hint I 
used worked fine on my test schema, but the customer's "table" turned out to 
be a view, and Oracle's optimizer worked well on the view whereas my hint was 
horrible.  Unfortunately, without the hint, Oracle sucked when working on an 
ordinary table.  Hints are dangerous, and I consider them a last resort.


I've been on the linux-kernel mailing list for the last 9 years, and have 
seen a similar debate rage during that entire time about kernel memory 
management. overall both of these tend to be conflicts between short-term 
and long-term benifits.


in the short-term the application user wants to be able to override the 
system to get the best performance _now_


in the long run the system designers don't trust the application 
programmers to get the hints right and want to figure out the right 
optimizer plan, even if it takes a lot longer to do so.


the key to this balance seems to be to work towards as few controls as 
possible, becouse the user will get them wrong far more frequently then 
they get them right, but when you hit a point where there's absolutly no 
way for the system to figure things out (and it's a drastic difference) 
provide the application with a way to hint to the system that things are 
unusueal, but always keep looking for patterns that will let the system 
detect the need itself


even the existing defaults are wrong as frequently as they are right (they 
were set when hardware was very different then it is today) so some way to 
gather real-world stats and set the system defaults based on actual 
hardware performance is really the right way to go (even for things like 
sequential scan speed that are set in the config file today)


David Lang

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] opinion on disk speed

2005-12-08 Thread David Lang

On Thu, 8 Dec 2005, Vivek Khera wrote:


I have a choice to make on a RAID enclosure:

14x 36GB 15kRPM ultra 320 SCSI drives

OR

12x 72GB 10kRPM ultra 320 SCSI drives

both would be configured into RAID 10 over two SCSI channels using a megaraid 
320-2x card.


My goal is speed.  Either would provide more disk space than I would need 
over the next two years.


The database does a good number of write transactions, and a decent number of 
sequential scans over the whole DB (about 60GB including indexes) for large 
reports.


My only concern is the 10kRPM vs 15kRPM.  The advantage of the 10k disks is 
that it would come from the same vendor as the systems to which it will be 
connected, making procurement easier.


if space isn't an issue then you fall back to the old standby rules of 
thumb


more spindles are better (more disk heads that can move around 
independantly)


faster drives are better (less time to read or write a track)

so the 15k drive option is better

one other note, you probably don't want to use all the disks in a raid10 
array, you probably want to split a pair of them off into a seperate raid1 
array and put your WAL on it.


David Lang


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] two disks - best way to use them?

2005-12-06 Thread David Lang

On Tue, 6 Dec 2005, Thomas Harold wrote:


Ron wrote:


For accuracy's sake, which exact config did you finally use?

How did you choose the config you finally used?  Did you test the three 
options or just pick one?


(Note: I'm not the original poster.)

I just picked the option of putting the data/pg_xlog directory (WAL) on a 2nd 
set of spindles.  That was the easiest thing for me to change on this test 
box.


The test server is simply a Gentoo box running software RAID and LVM2. The 
primary disk set is 2x7200RPM 300GB drives and the secondary disk set is 
2x5400RPM 300GB drives.  Brand new install of PGSQL 8.1, with mostly default 
settings (I changed FSM pages to be a higher value, max_fsm_pages = 15). 
PGSQL was given it's own ext3 32GB LVM volume on the primary disk set 
(2x7200RPM).  Originally, all files were on the primary disk.


the WAL is more sensitive to drive speeds then the data is, so you may 
pick up a little more performance by switching the WAL to the 7200 rpm 
drives instead of the 5400 rpm drives.


if you see a noticable difference with this, consider buying a pair of 
smaller, but faster drives (10k or 15k rpm drives, or a solid-state 
drive). you can test this (with significant data risk) by putting the WAL 
on a ramdisk and see what your performance looks like.


David Lang

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] BLCKSZ

2005-12-06 Thread David Lang

On Tue, 6 Dec 2005, Steinar H. Gunderson wrote:


On Tue, Dec 06, 2005 at 01:40:47PM +0300, Olleg wrote:

I can't undestand why "bigger is better". For instance in search by
index. Index point to page and I need load page to get one row. Thus I
load 8kb from disk for every raw. And keep it then in cache. You
recommend 64kb. With your recomendation I'll get 8 times more IO
throughput, 8 time more head seek on disk, 8 time more memory cache (OS
cache and postgresql) become busy.


Hopefully, you won't have eight times the seeking; a single block ought to be
in one chunk on disk. You're of course at your filesystem's mercy, though.


in fact useually it would mean 1/8 as many seeks, since the 64k chunk 
would be created all at once it's probably going to be one chunk on disk 
as Steiner points out and that means that you do one seek per 64k instead 
of one seek per 8k.


With current disks it's getting to the point where it's the same cost to 
read 8k as it is to read 64k (i.e. almost free, you could read 
substantially more then 64k and not notice it in I/O speed), it's the 
seeks that are expensive.


yes it will eat up more ram, but assuming that you are likly to need other 
things nearby it's likly to be a win.


as processor speed keeps climing compared to memory and disk speed true 
random access is really not the correct way to think about I/O anymore. 
It's frequently more appropriate to think of your memory and disks as if 
they were tape drives (seek then read, repeat)


even for memory access what you really do is seek to the beginning of a 
block (expensive) then read that block into cache (cheap, you get the 
entire cacheline of 64-128 bytes no matter if you need it or not) and then 
you can then access that block fairly quickly. with memory on SMP machines 
it's a constant cost to seek anywhere in memory, with NUMA machines 
(including multi-socket Opterons) the cost to do the seek and fetch 
depends on where in memory you are seeking to and what cpu you are running 
on. it also becomes very expensive for multiple CPU's to write to memory 
addresses that are in the same block (cacheline) of memory.


for disks it's even more dramatic, the seek is incredibly expensive 
compared to the read/write, and the cost of the seek varies based on how 
far you need to seek, but once you are on a track you can read the entire 
track in for about the same cost as a single block (in fact the drive 
useually does read the entire track before sending the one block on to 
you). Raid complicates this becouse you have a block size per drive and 
reading larger then that block size involves multiple drives.


most of the work in dealing with these issues and optimizing for them is 
the job of the OS, some other databases work very hard to take over this 
work from the OS, Postgres instead tries to let the OS do this work, but 
we still need to keep it in mind when configuring things becouse it's 
possible to make it much easier or much harder for the OS optimize things.


David Lang

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] two disks - best way to use them?

2005-12-05 Thread David Lang

On Mon, 5 Dec 2005, Thomas Harold wrote:

Yeah, I don't think I was clear about the config.  It's (4) disks setup as a 
pair of RAID1 sets.  My original config was pgsql on the first RAID set (data 
and WAL).  I'm now experimenting with putting the data/pg_xlog folder on the 
2nd set of disks.


Under the old setup (everything on the original RAID1 set, in a dedicated 
32GB LVM volume), I was seeing 80-90% wait percentages in "top".  My 
understanding is that this is an indicator of an overloaded / bottlenecked 
disk system.  This was while doing massive inserts into a test table 
(millions of narrow rows).  I'm waiting to see what happens once I have 
data/pg_xlog on the 2nd disk set.


in that case you logicly have two disks, so see the post from Ron earlier 
in this thread.


David Lang

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] two disks - best way to use them?

2005-12-05 Thread David Lang

On Mon, 5 Dec 2005, Thomas Harold wrote:


(noob question incoming)

Section 26.4 WAL Internals
http://www.postgresql.org/docs/8.1/interactive/wal-internals.html

This seems to be the applicable chapter.  They talk about creating a symlink 
for the data/pg_xlog folder to point at another disk set.


If I have (2) RAID1 sets with LVM2, can I instead create a logical volume on 
the 2nd disk set and just mount data/pg_xlog to point at the logical volume 
on the 2nd disk set?


For example, I have an LVM on my primary mirror called 'pgsql'.  And I've 
created a 2nd LVM on my secondary mirror called 'pgxlog'.  These are mounted 
as:


/dev/vgraida/pgsql on /var/lib/postgresql type ext3 (rw,noatime)

/dev/vgraidb/pgxlog on /var/lib/postgresql/data/pg_xlog type ext3 
(rw,noatime)


From the application's P.O.V., it's the same thing, right? (It seems to be 
working, I'm just trying to double-check that I'm not missing something.)




the application can' tell the difference, but the reason for seperating 
them isn't for the application, it's so that different pieces of hardware 
can work on different things without having to bounce back and forth 
between them.


useing the same drives with LVM doesn't achieve this goal.

the problem is that the WAL is doing a LOT of writes, and postgres waits 
until each write is completed before going on to the next thing (for 
safety), if a disk is dedicated to the WAL then the head doesn't move 
much. if the disk is used for other things as well then the heads have to 
move across the disk surface between the WAL and where the data is. this 
drasticly slows down the number of items that can go into the WAL, and 
therefor slows down the entire system.


this slowdown isn't even something as simple as cutting your speed in half 
(half the time spent working on the WAL, half spent on the data itself), 
it's more like 10% spent on the WAL, 10% spent on the data, and 80% 
moveing back and forth between them (I am probably wrong on the exact 
numbers, but it is something similarly drastic)


this is also the reason why it's so good to have a filesystem journal on a 
different drive.


David Lang

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Database restore speed

2005-12-03 Thread David Lang

On Sat, 3 Dec 2005, Luke Lonergan wrote:


Tom,

On 12/3/05 12:32 PM, "Tom Lane" <[EMAIL PROTECTED]> wrote:


"Luke Lonergan" <[EMAIL PROTECTED]> writes:

Last I looked at the Postgres binary dump format, it was not portable or
efficient enough to suit the need.  The efficiency problem with it was that
there was descriptive information attached to each individual data item, as
compared to the approach where that information is specified once for the
data group as a template for input.


Are you complaining about the length words?  Get real...


Hmm - "" repeat, efficiency is 1/2 of "" repeat.  I
think that's worth complaining about.


but how does it compare to the ASCII representation of that int? (remember 
to include your seperator characters as well)


yes it seems less efficiant, and it may be better to do something like 
send a record description header that gives the sizes of each item and 
then send the records following that without the size items, but either 
way should still be an advantage over the existing ASCII messages.


also, how large is the  in the message?

there are other optimizations that can be done as well, but if there's 
still a question about if it's worth it to do the parseing on the client 
then a first implmentation should be done without makeing to many changes 
to test things.


also some of the optimizations need to have measurements done to see if 
they are worth it (even something that seems as obvious as seperating the 
sizeof from the data itself as you suggest above has a penalty, namely it 
spreads the data that needs to be accessed to process a line between 
different cache lines, so in some cases it won't be worth it)


David Lang

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Database restore speed

2005-12-03 Thread David Lang

On Fri, 2 Dec 2005, Luke Lonergan wrote:

And how do we compose the binary data on the client?  Do we trust that 
the client encoding conversion logic is identical to the backend's?  If 
there is a difference, what happens if the same file loaded from 
different client machines has different results?  Key conflicts when 
loading a restore from one machine and not from another? - Luke


the same way you deal with text data that could be in different encodings, 
you tag your message with the format version you are useing and throw an 
error if you get a format you don't understand how to deal with.


if a client claims to be useing one format, but is instead doing something 
different you will be in deep trouble anyway.


remember, we aren't talking about random application code here, we are 
talking about postgres client code and libraries, if the library is 
incorrect then it's a bug, parsing bugs could happen in the server as 
welll. (in fact, the server could parse things to the intermediate format 
and then convert them, this sounds expensive, but given the high clock 
multipliers in use, it may not end up being measurable)


David Lang

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Database restore speed

2005-12-03 Thread David Lang

On Fri, 2 Dec 2005, Luke Lonergan wrote:


Micahel,

On 12/2/05 1:46 PM, "Michael Stone" <[EMAIL PROTECTED]> wrote:


Not necessarily; you may be betting that it's more *efficient* to do the
parsing on a bunch of lightly loaded clients than your server. Even if
you're using the same code this may be a big win.


If it were possible in light of the issues on client parse / convert, then
we should analyze whether it's a performance win.

In the restore case, where we've got a dedicated server with a dedicated
client machine, I don't see why there would be a speed benefit from running
the same parse / convert code on the client versus running it on the server.
Imagine a pipeline where there is a bottleneck, moving the bottleneck to a
different machine doesn't make it less of a bottleneck.


your database server needs to use it's CPU for 
other things besides the parseing. you could buy a bigger machine, but 
it's useally far cheaper to buy two dual-proc machines then it is one 
quad proc machine (and if you load is such that you already have a 
8-proc machine as the database, swallow hard when you ask for the price 
of a 16 proc machine), and in addition there is a substantial efficiancy 
loss in multi-proc machines (some software, some hardware) that may give 
you more available work cycles on the multiple small machines.


if you can remove almost all the parsing load (CPU cycles, memory 
footprint, and cache thrashing effects) then that box can do the rest of 
it's stuff more efficiantly. meanwhile the client can use what would 
otherwise be idle CPU to do the parseing.


if you only have a 1-1 relationship it's a good question  as to if it's a 
win (it depends on how much other stuff each box is having to do to 
support this), but if you allow for multiple clients it easily becomes a 
win.


David Lang

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Database restore speed

2005-12-03 Thread David Lang

On Fri, 2 Dec 2005, Luke Lonergan wrote:


Stephen,

On 12/2/05 1:19 PM, "Stephen Frost" <[EMAIL PROTECTED]> wrote:



I've used the binary mode stuff before, sure, Postgres may have to
convert some things but I have a hard time believing it'd be more
expensive to do a network_encoding -> host_encoding (or toasting, or
whatever) than to do the ascii -> binary change.


From a performance standpoint no argument, although you're betting that you
can do parsing / conversion faster than the COPY core in the backend can (I
know *we* can :-).  It's a matter of safety and generality - in general you
can't be sure that client machines / OS'es will render the same conversions
that the backend does in all cases IMO.


One more thing - this is really about the lack of a cross-platform binary
input standard for Postgres IMO.  If there were such a thing, it *would* be
safe to do this.  The current Binary spec is not cross-platform AFAICS, it
embeds native representations of the DATUMs, and does not specify a
universal binary representation of same.

For instance - when representing a float, is it an IEEE 32-bit floating
point number in little endian byte ordering? Or is it IEEE 64-bit?  With
libpq, we could do something like an XDR implementation, but the machinery
isn't there AFAICS.


This makes sense, however it then raises the question of how much effort 
it would take to define such a standard and implement the shim layer 
needed to accept the connections vs how much of a speed up it would result 
in (the gain could probaly be approximated with just a little hacking to 
use the existing binary format between two machines of the same type)


as for the standards, standard network byte order is big endian, so that 
should be the standard used (in spite of the quantity of x86 machines out 
there). for the size of the data elements, useing the largest size of each 
will probably still be a win in size compared to ASCII. converting between 
binary formats is useally a matter of a few and and shift opcodes (and 
with the core so much faster then it's memory you can afford to do quite a 
few of these on each chunk of data without it being measurable in your 
overall time)


an alturnative would be to add a 1-byte data type before each data element 
to specify it's type, but then the server side code would have to be 
smarter to deal with the additional possibilities.


David Lang

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Database restore speed

2005-12-03 Thread David Lang

On Fri, 2 Dec 2005, Michael Stone wrote:


On Fri, Dec 02, 2005 at 01:24:31PM -0800, Luke Lonergan wrote:

From a performance standpoint no argument, although you're betting that you
can do parsing / conversion faster than the COPY core in the backend can 


Not necessarily; you may be betting that it's more *efficient* to do the
parsing on a bunch of lightly loaded clients than your server. Even if
you're using the same code this may be a big win.


it's a lot easier to throw hardware at the problem by spliting your 
incomeing data between multiple machines and have them all working in 
parallel throwing the data at one database then it is to throw more 
hardware at the database server to speed it up (and yes, assuming that MPP 
splits the parseing costs as well, it can be an answer for some types of 
systems)


David Lang

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Database restore speed

2005-12-03 Thread David Lang

On Fri, 2 Dec 2005, Luke Lonergan wrote:


Stephen,

On 12/2/05 12:18 PM, "Stephen Frost" <[EMAIL PROTECTED]> wrote:


Just a thought, but couldn't psql be made to use the binary mode of
libpq and do at least some of the conversion on the client side?  Or
does binary mode not work with copy (that wouldn't suprise me, but
perhaps copy could be made to support it)?


Yes - I think this idea is implicit in what David suggested, and my response
as well.  The problem is that the way the client does conversions can
potentially differ from the way the backend does.  Some of the types in
Postgres are machine intrinsic and the encoding conversions use on-machine
libraries, each of which preclude the use of client conversion methods
(without a lot of restructuring).  We'd tackled this problem in the past and
concluded that the parse / convert stage really belongs in the backend.


I'll bet this parsing cost varys greatly with the data types used, I'm 
also willing to bet that for the data types that hae different encoding on 
different systems there could be a intermediate encoding that is far 
faster to parse then ASCII text is.


for example, (and I know nothing about the data storage itself so this is 
just an example), if the issue was storing numeric values on big endian 
and little endian systems (and 32 bit vs 64 bit systems to end up with 4 
ways of holding the data) you have a substantial cost in parseing the 
ASCII and converting it to a binary value, but the client can't (and 
shouldn't) know which endian type and word size the server is. but it 
could create a big endian multi-precision encoding that would then be very 
cheap for the server to split and flip as nessasary. yes this means more 
work is done overall, but it's split between different machines, and the 
binary representation of the data will reduce probably your network 
traffic as a side effect.


and for things like date which get parsed in multiple ways until one is 
found that seems sane, there's a significant amount of work that the 
server could avoid.


David Lang


The other thought, of course, is that you could use PITR for your
backups instead of pgdump...


Totally - great idea, if this is actually a backup / restore then PITR plus
filesystem copy (tarball) is hugely faster than dump / restore.

- Luke



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] filesystem performance with lots of files

2005-12-02 Thread David Lang

On Fri, 2 Dec 2005, Qingqing Zhou wrote:



I don't have all the numbers readily available (and I didn't do all the
tests on every filesystem), but I found that even with only 1000
files/directory ext3 had some problems, and if you enabled dir_hash some
functions would speed up, but writing lots of files would just collapse
(that was the 80 min run)



Interesting. I would suggest test small number but bigger file would be
better if the target is for database performance comparison. By small
number, I mean 10^2 - 10^3; By bigger, I mean file size from 8k to 1G
(PostgreSQL data file is at most this size under normal installation).


I agree, that round of tests was done on my system at home, and was in 
response to a friend who had rsync over a local lan take > 10 hours for 
<10G of data. but even so it generated some interesting info. I need to 
make a more controlled run at it though.



Let's take TPCC as an example, if we get a TPCC database of 500 files,
each one is at most 1G (PostgreSQL has this feature/limit in ordinary
installation), then this will give us a 500G database, which is big enough
for your current configuration.

Regards,
Qingqing



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Database restore speed

2005-12-01 Thread David Lang

On Fri, 2 Dec 2005, Luke Lonergan wrote:


Steve,


When we restore the postmaster process tries to use 100% of the CPU.

The questions we have are:

1) What is postmaster doing that it needs so much CPU?


Parsing mostly, and attribute conversion from text to DBMS native
formats.


2) How can we get our system to go faster?


Use Postgres 8.1 or Bizgres.  Get a faster CPU.

These two points are based on our work to improve COPY speed, which led
to a near doubling in Bizgres, and in the 8.1 version it's about 60-70%
faster than in Postgres 8.0.

There are currently two main bottlenecks in COPY, one is parsing +
attribute conversion (if the postgres CPU is nailed at 100% that's what
your limit is) and the other is the write speed through the WAL.  You
can roughly divide the write speed of your disk by 3 to get that limit,
e.g. if your disk can write 8k blocks at 100MB/s, then your COPY speed
might be limited to 33MB/s.  You can tell which of these limits you've
hit using "vmstat 1" on Linux or iostat on Solaris and watch the blocks
input/output on your disk while you watch your CPU.


Luke, would it help to have one machine read the file and have it connect 
to postgres on a different machine when doing the copy? (I'm thinking that 
the first machine may be able to do a lot of the parseing and conversion, 
leaving the second machine to just worry about doing the writes)


David Lang

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] 15,000 tables

2005-12-01 Thread David Lang

On Thu, 1 Dec 2005, Craig A. James wrote:

So say I need 10,000 tables, but I can create tablespaces.  Wouldn't that 
solve the performance problem caused by Linux's (or ext2/3's) problems with 
large directories?


For example, if each user creates (say) 10 tables, and I have 1000 users, I 
could create 100 tablespaces, and assign groups of 10 users to each 
tablespace.  This would limit each tablespace to 100 tables, and keep the 
ext2/3 file-system directories manageable.


Would this work?  Would there be other problems?


This would definantly help, however there's still the question of how 
large the tables get, and how many total files are needed to hold the 100 
tables.


you still have the problem of having to seek around to deal with all these 
different files (and tablespaces just spread them further apart), you 
can't solve this, but a large write-back journal (as opposed to 
metadata-only) would mask the problem.


it would be a trade-off, you would end up writing all your data twice, so 
the throughput would be lower, but since the data is safe as soon as it 
hits the journal the latency for any one request would be lower, which 
would allow the system to use the CPU more and overlap it with your 
seeking.


David Lang

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Open request for benchmarking input (fwd)

2005-12-01 Thread David Lang
here are the suggestions from the MySQL folks, what additional tests 
should I do.


I'd like to see some tests submitted that map out when not to use a 
particular database engine, so if you have a test that you know a 
particular database chokes on let me know (bonus credibility if you 
include tests that your own database has trouble with :)


David Lang

-- Forwarded message -- Date: Thu, 01 Dec 2005 16:14:25

David,

The choice of benchmark depends on what kind of application would you
like to see performance for.

Than someone speaks about one or other database to be faster than other
in general, it makes me smile.  That would be the same as tell one car
would be able to win all competitions starting from Formula-1 and ending
with off-road  racing.

There are certain well known cases when MySQL will be faster - for
example in memory storage engine is hard to beat in point selects, or
bulk inserts  in MyISAM (no transactional overhead).

There are certain known cases when MySQL would not perform well - it is
easy to build the query using subqueries which would be horribly slow on
MySQL but decent on postgresql... but well  writing application for
MySQL you would not write such query.


I think most database agnostic way would be to select the "workload"
from user point of view and have it implemented the most efficient way
for database in question -  for example you may find TPC-C
implementations by different vendors are a lot different.




For my own interests, I would like to at least cover the following bases:
32 bit vs 64 bit vs 64 bit kernel + 32 bit user-space; data warehouse type
tests (data >> memory); and web prefs test (active data RAM)


You may grab Dell DVD store:

http://linux.dell.com/dvdstore/

for Web benchmark.  It does not have PostgreSQL build in but there some
implementations available in the Internet

DBT2 by OSDL is other good candidate - it does support postgreSQL and
MySQL natively.


If you want some raw performance number such as number selects/sec you
may use SysBench - http://sysbench.sourceforge.net


For DataWarehouse workloads you could grab  TPC-H  or   DBT3
implementation by OSDL  - We run this successfully with MySQL

You also could take a look at http://benchw.sourceforge.net/



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] filesystem performance with lots of files

2005-12-01 Thread David Lang

On Thu, 1 Dec 2005, Qingqing Zhou wrote:


"David Lang" <[EMAIL PROTECTED]> wrote


a few weeks ago I did a series of tests to compare different filesystems.
the test was for a different purpose so the particulars are not what I
woud do for testing aimed at postgres, but I think the data is relavent)
and I saw major differences between different filesystems, I'll see aobut
re-running the tests to get a complete set of benchmarks in the next few
days. My tests had their times vary from 4 min to 80 min depending on the
filesystem in use (ext3 with hash_dir posted the worst case). what testing
have other people done with different filesystems?



That's good ... what benchmarks did you used?


I was doing testing in the context of a requirement to sync over a million 
small files from one machine to another (rsync would take >10 hours to do 
this over a 100Mb network so I started with the question 'how long would 
it take to do a tar-ftp-untar cycle with no smarts) so I created 1m x 1K 
files in a three deep directory tree (10d/10d/10d/1000files) and was doing 
simple 'time to copy tree', 'time to create tar', 'time to extract from 
tar', 'time to copy tarfile (1.6G file). I flushed the memory between each 
test with cat largefile >/dev/null (I know now that I should have 
unmounted and remounted between each test), source and destination on 
different IDE controllers


I don't have all the numbers readily available (and I didn't do all the 
tests on every filesystem), but I found that even with only 1000 
files/directory ext3 had some problems, and if you enabled dir_hash some 
functions would speed up, but writing lots of files would just collapse 
(that was the 80 min run)


I'll have to script it and re-do the tests (and when I do this I'll also 
set it to do a test with far fewer, far larger files as well)


David Lang

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] About the relation between fragmentation of file and

2005-12-01 Thread David Lang

On Thu, 1 Dec 2005, Richard Huxton wrote:


Tatsumi Abe wrote:

Question is about the relation between fragmentation of file and VACUUM
performance.


OS:RedHat Enterprise Linux AS Release 3(Taroon Update 6)
Kernel 2.4.21-37.ELsmp on an i686
Filesystem Type ext3
Filesystem features:  has_journal filetype needs_recovery sparse_super 
large_file


try different filesystems, ext2/3 do a very poor job when you have lots of 
files in a directory (and 7000+ files is a lot). you can also try mounting 
the filesystem with noatime, nodiratime to reduce the seeks when reading, 
and try mounting it with oldalloc (which changes how the files are 
arranged on disk when writing and extending them), I've seen drastic 
speed differences between ext2 and ext3 based on this option (ext2 
defaults to oldalloc, ext3 defaults to orlov, which is faster in many 
cases)



CPU:Intel(R) Xeon(TM) CPU 2.80GHz stepping 01
Memory:2.0GB
HDD:80GB(S-ATA)
 SATA max UDMA/133
PostgreSQL:7.3.8


1. Approx. there are 3500 tables in the DB



When the performance of inserting data was measured in the above-
mentioned environment, it takes six minutes to write 1 lines
after 4/5 days the measurement had begun. While searching the reason
of bottleneck by executing iostat command it is understood that DISK I/O
was problem for the neck as %iowait was almost 100% at that time.

On the very first day processing time of VACUUM is not a problem but
when the day progress its process time is increasing.Then I examined the
fragmentation of database area(pgsql/data/base) by using the following tools.

Disk Allocation Viewer
http://sourceforge.net/projects/davtools/

Fragmentation rate is 28% before defrag.


I'd guess the root of your problem is the number of tables (3500), which
if each has one index represents at least 7000 files. That means a lot
of your I/O time will probably be spent moving the disk heads between
the different files.


depending on the size of the tables it can actually be a lot worse then 
this (remember Postgres splits the tables into fixed size chunks)


when postgres adds data it will eventually spill over into additional 
files, when you do a vaccum does it re-write the tables into a smaller 
number of files or just rewrite the individual files (makeing each of them 
smaller, but keeping the same number of files)


speaking of this, the selection of the size of these chunks is a 
comprimize between the time needed to seek in an individual file and the 
number of files that are created, is there an easy way to tinker with this 
(I am sure the default is not correct for all filesystems, the filesystem 
handling of large and/or many files differ drasticly)



You say you can't stop the server, so there's no point in thinking about
a quick hardware upgrade to help you. Also a version-upgrade is not
do-able for you.


there's a difference between stopping the server once for an upgrade 
(hardware or software) and having to stop it every few days to defrag 
things forever after.


David Lang


I can only think of two other options:
1. Change the database schema to reduce the number of tables involved.
I'm assuming that of the 3500 tables most hold the same data but for
different clients (or something similar). This might not be practical
either.

2. Re-order how you access the database. ANALYSE the updated tables
regularly, but only VACUUM them after deletions. Group your inserts so
that all the inserts for table1 go together, then all the inserts for
table2 go together and so on. This should help with the fragmentation by
making sure the files get extended in larger chunks.

Are you sure it's not possible to spend 15 mins offline to solve this?
--
 Richard Huxton
 Archonet Ltd

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[PERFORM] filesystem performance with lots of files

2005-12-01 Thread David Lang
this subject has come up a couple times just today (and it looks like one 
that keeps popping up).


under linux ext2/3 have two known weaknesses (or rather one weakness with 
two manifestations). searching through large objects on disk is slow, this 
applies to both directories (creating, opening, deleting files if there 
are (or have been) lots of files in a directory), and files (seeking to 
the right place in a file).


the rule of thumb that I have used for years is that if files get over a 
few tens of megs or directories get over a couple thousand entries you 
will start slowing down.


common places you can see this (outside of postgres)

1. directories, mail or news storage.
  if you let your /var/spool/mqueue directory get large (for example a 
server that can't send mail for a while or mail gets misconfigured on). 
there may only be a few files in there after it gets fixed, but if the 
directory was once large just doing a ls on the directory will be slow.


  news servers that store each message as a seperate file suffer from this 
as well, they work around it by useing multiple layers of nested 
directories so that no directory has too many files in it (navigating the 
layers of directories costs as well, it's all about the tradeoffs). Mail 
servers that use maildir (and Cyrus which uses a similar scheme) have the 
same problem.


  to fix this you have to create a new directory and move the files to 
that directory (and then rename the new to the old)


  ext3 has an option to make searching directories faster (htree), but 
enabling it kills performance when you create files. And this doesn't help 
with large files.


2. files, mbox formatted mail files and log files
  as these files get large, the process of appending to them takes more 
time. syslog makes this very easy to test. On a box that does syncronous 
syslog writing (default for most systems useing standard syslog, on linux 
make sure there is not a - in front of the logfile name) time how long it 
takes to write a bunch of syslog messages, then make the log file large 
and time it again.


a few weeks ago I did a series of tests to compare different filesystems. 
the test was for a different purpose so the particulars are not what I 
woud do for testing aimed at postgres, but I think the data is relavent) 
and I saw major differences between different filesystems, I'll see aobut 
re-running the tests to get a complete set of benchmarks in the next few 
days. My tests had their times vary from 4 min to 80 min depending on the 
filesystem in use (ext3 with hash_dir posted the worst case). what testing 
have other people done with different filesystems?


David Lang

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] 15,000 tables

2005-12-01 Thread David Lang

On Thu, 1 Dec 2005, Michael Riess wrote:


Hi,

we are currently running a postgres server (upgraded to 8.1) which has one 
large database with approx. 15,000 tables. Unfortunately performance suffers 
from that, because the internal tables (especially that which holds the 
attribute info) get too large.


is it becouse the internal tables get large, or is it a problem with disk 
I/O?


with 15,000 tables you are talking about a LOT of files to hold these 
(30,000 files with one index each and each database being small enough to 
not need more then one file to hold it), on linux ext2/3 this many files 
in one directory will slow you down horribly. try different filesystems 
(from my testing and from other posts it looks like XFS is a leading 
contender), and also play around with the tablespaces feature in 8.1 to 
move things out of the main data directory into multiple directories. if 
you do a ls -l on the parent directory you will see that the size of the 
directory is large if it's ever had lots of files in it, the only way to 
shrink it is to mv the old directory to a new name, create a new directory 
and move the files from the old directory to the new one.


David Lang


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-28 Thread David Lang

On Mon, 28 Nov 2005, Brendan Duddridge wrote:

Forgive my ignorance, but what is MPP? Is that part of Bizgres? Is it 
possible to upgrade from Postgres 8.1 to Bizgres?


MPP is the Greenplum propriatary extention to postgres that spreads the 
data over multiple machines, (raid, but with entire machines not just 
drives, complete with data replication within the cluster to survive a 
machine failing)


for some types of queries they can definantly scale lineraly with the 
number of machines (other queries are far more difficult and the overhead 
of coordinating the machines shows more. this is one of the key things 
that the new version they recently announced the beta for is supposed to 
be drasticly improving)


early in the year when I first looked at them their prices were 
exorbadent, but Luke says I'm wildly mistake on their current prices so 
call them for details


it uses the same interfaces as postgres so it should be a drop in 
replacement to replace a single server with a cluster.


it's facinating technology to read about.

I seem to remember reading that one of the other postgres companies is 
also producing a clustered version of postgres, but I don't remember who 
and know nothing about them.


David Lang

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-28 Thread David Lang

On Mon, 28 Nov 2005, Brendan Duddridge wrote:


Hi David,

Thanks for your reply. So how is that different than something like Slony2 or 
pgcluster with multi-master replication? Is it similar technology? We're 
currently looking for a good clustering solution that will work on our Apple 
Xserves and Xserve RAIDs.


MPP doesn't just split up the data, it splits up the processing as well, 
so if you have a 5 machine cluster, each machine holds 1/5 of your data 
(plus a backup for one of the other machines) and when you do a query MPP 
slices and dices the query to send a subset of the query to each machine, 
it then gets the responses from all the machines and combines them


if you ahve to do a full table scan for example, wach machine would only 
have to go through 20% of the data


a Slony of pgcluster setup has each machine with a full copy of all the 
data, only one machine can work on a given query at a time, and if you 
have to do a full table scan one machine needs to read 100% of the data.


in many ways this is the holy grail of databases. almost all other areas 
of computing can now be scaled by throwing more machines at the problem in 
a cluster, with each machine just working on it's piece of the problem, 
but databases have had serious trouble doing the same and so have been 
ruled by the 'big monster machine'. Oracle has been selling Oracle Rac for 
a few years, and reports from people who have used it range drasticly 
(from it works great, to it's a total disaster), in part depending on the 
types of queries that have been made.


Greenplum thinks that they have licked the problems for the more general 
case (and that commodity networks are now fast enough to match disk speeds 
in processing the data) if they are right then when they hit full release 
with the new version they should be cracking a lot of the 
price/performance records on the big database benchmarks (TPC and 
similar), and if their pricing is reasonable, they may be breaking them by 
an order of magnatude or more (it's not unusual for the top machines to 
spend more then $1,000,000 on just their disk arrays for those 
systems, MPP could conceivably put togeather a cluster of $5K machines 
that runs rings around them (and probably will for at least some of the 
subtests, the big question is if they can sweep the board and take the top 
spots outright)


they have more details (and marketing stuff) on their site at 
http://www.greenplum.com/prod_deepgreen_cluster.html


don't get me wrong, I am very impressed with their stuff, but (haveing 
ranted a little here on the list about them) I think MPP and it's 
performace is a bit off topic for the postgres performance list (at least 
until the postgres project itself starts implementing similar features :-)


David Lang


Thanks,


Brendan Duddridge | CTO | 403-277-5591 x24 |  [EMAIL PROTECTED]

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB  T2G 0V9

http://www.clickspace.com

On Nov 27, 2005, at 8:09 PM, David Lang wrote:


On Mon, 28 Nov 2005, Brendan Duddridge wrote:

Forgive my ignorance, but what is MPP? Is that part of Bizgres? Is it 
possible to upgrade from Postgres 8.1 to Bizgres?


MPP is the Greenplum propriatary extention to postgres that spreads the 
data over multiple machines, (raid, but with entire machines not just 
drives, complete with data replication within the cluster to survive a 
machine failing)


for some types of queries they can definantly scale lineraly with the 
number of machines (other queries are far more difficult and the overhead 
of coordinating the machines shows more. this is one of the key things that 
the new version they recently announced the beta for is supposed to be 
drasticly improving)


early in the year when I first looked at them their prices were exorbadent, 
but Luke says I'm wildly mistake on their current prices so call them for 
details


it uses the same interfaces as postgres so it should be a drop in 
replacement to replace a single server with a cluster.


it's facinating technology to read about.

I seem to remember reading that one of the other postgres companies is also 
producing a clustered version of postgres, but I don't remember who and 
know nothing about them.


David Lang






---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Open request for benchmarking input

2005-11-27 Thread David Lang

On Sun, 27 Nov 2005, Andreas Pflug wrote:


David Lang wrote:


Postgres needs to work on the low end stuff as well as the high end stuff 
or people will write their app to work with things that DO run on low end 
hardware and they spend much more money then is needed to scale the 
hardware up rather then re-writing their app.


I agree that pgsql runs on low end stuff, but a dual Opteron with 2x15kSCSI 
isn't low end, is it? The CPU/IO performance isn't balanced for the total 
cost, you probably could get a single CPU/6x15kRPM machine for the same price 
delivering better TP performance in most scenarios.


Benchmarks should deliver results that are somewhat comparable. If performed 
on machines that don't deliver a good CPU/IO power balance for the type of DB 
load being tested, they're misleading and hardly usable for comparision 
purposes, and even less for learning how to configure a decent server since 
you might have to tweak some parameters in an unusual way.


a couple things to note,

first, when running benchmarks there is a need for client machines to 
stress the database, these machines are what are available to be clients 
as well as servers.


second, the smaller machines are actually about what I would spec out for 
a high performance database that's reasonably small, a couple of the boxes 
have 144G drives, if they are setup as raid1 then the boxes would be 
reasonable to use for a database up to 50G or larger (assuming you need 
space on the DB server to dump the database, up to 100G or so if you 
don't)


David Lang

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-26 Thread David Lang

On Sun, 27 Nov 2005, Luke Lonergan wrote:

For data warehousing its pretty well open and shut.  To use all cpus and 
io channels on each query you will need mpp.


Has anyone done the math.on the original post?  5TB takes how long to 
scan once?  If you want to wait less than a couple of days just for a 
seq scan, you'd better be in the multi-gb per second range.


if you truely need to scan the entire database then you are right, however 
indexes should be able to cut the amount you need to scan drasticly.


David Lang

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-26 Thread David Lang

Another thought - I priced out a maxed out machine with 16 cores and
128GB of RAM and 1.5TB of usable disk - $71,000.

You could instead buy 8 machines that total 16 cores, 128GB RAM and 28TB
of disk for $48,000, and it would be 16 times faster in scan rate, which
is the most important factor for large databases.  The size would be 16
rack units instead of 5, and you'd have to add a GigE switch for $1500.

Scan rate for above SMP: 200MB/s

Scan rate for above cluster: 3,200Mb/s

You could even go dual core and double the memory on the cluster and
you'd about match the price of the "god box".

- Luke


Luke, I assume you are talking about useing the Greenplum MPP for this 
(otherwise I don't know how you are combining all the different systems).


If you are, then you are overlooking one very significant factor, the cost 
of the MPP software, at $10/cpu the cluster has an extra $160K in software 
costs, which is double the hardware costs.


if money is no object then go for it, but if it is then you comparison 
would be (ignoring software maintinance costs) the 16 core 128G ram system 
vs ~3xsmall systems totaling 6 cores and 48G ram.


yes if scan speed is the bottleneck you still win with the small systems, 
but for most other uses the large system would win easily. and in any case 
it's not the open and shut case that you keep presenting it as.


David Lang

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Open request for benchmarking input

2005-11-26 Thread David Lang
These boxes don't look like being designed for a DB server. The first are 
very CPU bound, and the third may be a good choice for very large amounts 
of streamed data, but not optimal for TP random access.


I don't know what you mean when you say that the first ones are CPU bound, 
they have far more CPU then they do disk I/O


however I will agree that they were not designed to be DB servers, they 
weren't. they happen to be the machines that I have available.


they only have a pair of disks each, which would not be reasonable for 
most production DB uses, and they have far more CPU then is normally 
reccomended. So I'll have to run raid 0 instead of 0+1 (or not use raid) 
which would be unacceptable in a production environment, but can still 
give some useful info.


the 5th box _was_ purchased to be a DB server, but one to store and 
analyse large amounts of log data, so large amounts of data storage were 
more important then raw DB performance (although we did max out the RAM at 
16G to try and make up for it). it was a deliberate price/performance 
tradeoff. this machine ran ~$20k, but a similar capacity with SCSI drives 
would have been FAR more expensive (IIRC a multiple of 4x or more more 
expensive).


Hopefully, when publicly visible benchmarks are performed, machines are 
used that comply with common engineering knowledge, ignoring those guys 
who still believe that sequential performance is the most important issue 
on disk subsystems for DBMS.


are you saying that I shouldn't do any benchmarks becouse the machines 
aren't what you would consider good enough?


if so I disagree with you and think that benchmarks should be done on even 
worse machines, but should also be done on better machines. (are you 
volunteering to provide time on better machines for benchmarks?)


not everyone will buy a lot of high-end hardware before they start useing 
a database. in fact most companies will start with a database on lower end 
hardware and then as their requirements grow they will move to better 
hardware. I'm willing to bet that what I have available is better then the 
starting point for most places.


Postgres needs to work on the low end stuff as well as the high end stuff 
or people will write their app to work with things that DO run on low end 
hardware and they spend much more money then is needed to scale the 
hardware up rather then re-writing their app.


Part of the reason that I made the post on /. to start this was the hope 
that a reasonable set of benchmarks could be hammered out and then more 
people then just me could run them to get a wider range of results.


David Lang

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Open request for benchmarking input

2005-11-26 Thread David Lang

by the way, this is the discussion that promped me to start this project
http://lwn.net/Articles/161323/

David Lang

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Open request for benchmarking input

2005-11-26 Thread David Lang

Ok, I've subscribed (hopefully list volume won't kill me :-)

I'm covering several things in this message since I didn't receive the 
prior messages in the thread


first off these benchamrks are not being sponsered by my employer, they 
need the machines burned in and so I'm going to use them for the tests 
while burning them in. I can spend a little official time on this, 
justifying it as learning the proper config/tuneing settings for our 
project, but not too much. and I'm deliberatly not useing my work e-mail 
and am not mentioning the company name, so please respect this and keep 
the two seperate (some of you have dealt with us formally, others will 
over the next few months)


this means no remote access for people (but I am willing to run tests and 
send configs around). in fact the machines will not have Internet access 
for the duration of the tests.


it also means I'm doing almost all the configuration work for this on my 
own time (nights and weekends). the machines will not be moved to 
production for a couple of months. this should mean that we can go back 
and forth with questions and answers (albeit somewhat slowly, with me 
checking in every night) while whatever tests are done happen during the 
day. once we get past the system tuneing and start doing different tests 
it would probably be helpful if people can send me scripts to run that I 
can just let loose.


I don't have any money to pay for benchmark suites, so if things like the 
TPC benchmarks cost money to do I won't be able to do them


to clarify the hardware

I have 5 machines total to work with, this includes client machines to 
make the queries (I may be able to get hold of 2-3 more, but they are 
similar configs)


none of these have dual-core processors on them, the CPU's are 246 or 252 
Opterons (I'll have to double check which is in which machine, I think the 
large disk machine has 246's and the others 252's)


I have access to a gig-E switch that's on a fairly idle network to use to 
connect these machines


the large-disk machine has 3ware 9500 series 8-port SATA controllers in 
them with battery backup. in our official dealings with Greenplum we 
attempted to do a set of benchmarks on that machine, but had horrible 
timing with me being too busy when they worked with us on this and we 
never did figure out the best setting to use for this machine.


Part of the reason I posted this to /. rather then just contacting you and 
MySQL folks directly is that I would like to see a reasonable set of 
benchmarks agreed to and have people with different hardware then I have 
run the same sets of tests. I know the tuneing will be different for 
different hardware, but if we can have a bunch of people run similar tests 
we should learn a lot.


David Lang

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly