Re: [PERFORM] Any better plan for this query?..

2009-05-07 Thread Gregory Stark
Simon Riggs  writes:

> We already know that MySQL favors nested loop joins

>From what I read I thought that was the *only* type of join MySQL supports.

The big picture view here is that whether we run a short query in half a
millisecond versus two milliseconds is usually not really important. It could
matter if you're concerned with how many transactions/s you can run in a busy
server -- but that's not exactly the same thing and you should really measure
that in that case.

It would be nice if we were in the same ballpark as MySQL but we would only be
interesting in such optimizations if they don't come at the expense of
scalability under more complex workloads.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-18 Thread Gregory Stark

"Jignesh K. Shah"  writes:

> In next couple of weeks I plan to test the patch on a different x64 based
> system to do a sanity testing on lower number of cores and also try out other
> workloads ...

I'm actually more interested in the large number of cores but fewer processes
and lower max_connections. If you set max_connections to 64 and eliminate the
wait time you should, in theory, be able to get 100% cpu usage. It would be
very interesting to track down the contention which is preventing that.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High CPU Utilization

2009-03-16 Thread Gregory Stark
Greg Smith  writes:

> On Mon, 16 Mar 2009, Joe Uhl wrote:
>
>> Here is vmstat 1 30.  We are under peak load right now so I can gather
>> information from the real deal
>
> Quite helpful, reformatting a bit and picking an informative section:
>
> procs ---memory-----swap- io--- -system-- cpu
> r  b   swpd free   buff  cache   si   so   bibo   in   cs us sy id wa
> 0 34  95048 11025880  56988 15020168 00  3852   160 3616 8614 11  1  6 82
> 3 25  95048 10996356  57044 15044796 00  7892   456 3126 7115  4  3  8 85
> 1 26  95048 10991692  57052 15050100 00  5188   176 2566 5976  3  2 12 83
>
> This says that your server is spending all its time waiting for I/O, actual 
> CPU
> utilization is minimal.  You're only achieving around 3-8MB/s of random I/O.
> That's the reality of what your disk I/O subsystem is capable of, regardless 
> of
> what its sequential performance with dd looks like.  If you were to run a more
> complicated benchmark like bonnie++ instead, I'd bet that your "seeks/second"
> results are very low, even though sequential read/write is fine.
>
> The Perc5 controllers have a pretty bad reputation for performance on this
> list, even in RAID10.  Not much you can do about that beyond scrapping the
> controller and getting a better one.

Hm, well the tests I ran for posix_fadvise were actually on a Perc5 -- though
who knows if it was the same under the hood -- and I saw better performance
than this. I saw about 4MB/s for a single drive and up to about 35MB/s for 15
drives. However this was using linux md raid-0, not hardware raid.

But you shouldn't get your hopes up too much for random i/o. 3-8MB seems low
but consider the following:

 $ units
 2445 units, 71 prefixes, 33 nonlinear units

 You have: 8kB / .5|7200min
 You want: MB/s
 * 1.92
 / 0.5208

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres benchmarking with pgbench

2009-03-16 Thread Gregory Stark
Greg Smith  writes:

> On Mon, 16 Mar 2009, Gregory Stark wrote:
>
>> Why would checkpoints force out any data? It would dirty those pages and then
>> sync the files marking them clean, but they should still live on in the
>> filesystem cache.
>
> The bulk of the buffer churn in pgbench is from the statement that updates a
> row in the accounts table.  That constantly generates updated data block and
> index block pages.  If you can keep those changes in RAM for a while before
> forcing them to disk, you can get a lot of benefit from write coalescing that
> goes away if constant checkpoints push things out with a fsync behind them.
>
> Not taking advantage of that effectively reduces the size of the OS cache,
> because you end up with a lot of space holding pending writes that wouldn't
> need to happen at all yet were the checkpoints spaced out better.

Ok, so it's purely a question of write i/o, not reduced cache effectiveness.
I think I could see that. I would be curious to see these results with a
larger checkpoint_segments setting.

Looking further at the graphs I think they're broken but not in the way I had
guessed. It looks like they're *overstating* the point at which the drop
occurs. Looking at the numbers it's clear that under 1GB performs well but at
1.5GBP it's already dropping to the disk-resident speed.

I think pgbench is just not that great a model for real-world usage . a) most
real world workloads are limited by read traffic, not write traffic, and
certainly not random update write traffic; and b) most real-world work loads
follow a less uniform distribution so keeping busy records and index regions
in memory is more effective.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres benchmarking with pgbench

2009-03-16 Thread Gregory Stark
Greg Smith  writes:

> On Mon, 16 Mar 2009, m...@bortal.de wrote:
>
>> Any idea why my performance colapses at 2GB Database size?

I don't understand how you get that graph from the data above. The data above
seems to show your test databases at 1.4GB and 2.9GB. There are no 1GB and 2GB
data points like the graphs show.

Presumably the data point at 2G on the graph should really be at 2.9GB? In
which case I don't find it surprising at all that performance would start to
shift from RAM-resident before that to disk-resident above that. You have 1GB
set aside for shared buffers leaving about 3GB for filesystem cache.

You could try setting shared buffers smaller, perhaps 512kB or larger, perhaps
3.5GB. To minimize the overlap. I would tend to avoid the latter though.

One thing to realize is that pgbench performs a completely flat distribution
of data accesses. So every piece of data is equally likely to be used. In real
life work-loads there are usually some busier and some less busy sections of
the database and the cache tends to keep the hotter data resident even as the
data set grows.

> In your case, you've got shared_buffers=1GB, but the rest of the RAM is the
> server isn't so useful to you because you've got checkpoint_segments set to 
> the
> default of 3.  That means your system is continuously doing small checkpoints
> (check your database log files, you'll see what I meant), which keeps things
> from ever really using much RAM before everything has to get forced to disk.

Why would checkpoints force out any data? It would dirty those pages and then
sync the files marking them clean, but they should still live on in the
filesystem cache.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-16 Thread Gregory Stark
"Jignesh K. Shah"  writes:

> Generally when there is dead constant.. signs of classic bottleneck ;-)  We
> will be fixing one to get to another.. but knocking bottlenecks is the name of
> the game I think

Indeed. I think the bottleneck we're interested in addressing here is why you
say you weren't able to saturate the 64 threads with 64 processes when they're
all RAM-resident.

>From what I see you still have 400+ processes? Is that right?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 8.4 Performance improvements: was Re: Proposal of tunable fix for scalability of 8.4

2009-03-13 Thread Gregory Stark
Robert Haas  writes:

> On Fri, Mar 13, 2009 at 10:06 PM, Tom Lane  wrote:
>
>> I assume you meant effective_io_concurrency.  We'd still need a special
>> case because the default is currently hard-wired at 1, not 0, if
>> configure thinks the function exists.  Also there's a posix_fadvise call
>> in xlog.c that that parameter doesn't control anyhow.
>
> I think 1 should mean no prefetching, rather than 0.  If the number of
> concurrent I/O requests was 0, that would mean you couldn't perform
> any I/O at all.

That is actually how I had intended it but apparently I messed it up at some
point such that later patches were doing some prefetching at 1 and there was
no way to disable it. When Tom reviewed it he corrected the inability to
disable prefetching by making 0 disable prefetching.

I didn't think it was worth raising as an issue but I didn't realize we were
currently doing prefetching by default? i didn't realize that. Even on a
system with posix_fadvise there's nothing much to be gained unless the data is
on a RAID device, so the original objection holds anyways. We shouldn't do any
prefetching unless the user tells us to.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 8.4 Performance improvements: was Re: Proposal of tunable fix for scalability of 8.4

2009-03-13 Thread Gregory Stark
Tom Lane  writes:

> Alan Stange  writes:
>> Gregory Stark wrote:
>>> AFAIK Opensolaris doesn't implement posix_fadvise() so there's no benefit.
>
>> It's implemented.   I'm guessing it's not what you want to see though:
>> http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libc/port/gen/posix_fadvise.c
>
> Ugh.  So apparently, we actually need to special-case Solaris to not
> believe that posix_fadvise works, or we'll waste cycles uselessly
> calling a do-nothing function.  Thanks, Sun.

Do we? Or do we just document that setting effective_cache_size on Solaris
won't help?

I'm leaning towards the latter because I expect Sun will implement this and
there will be people running 8.4 on newer versions of the OS long after it's
out.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 8.4 Performance improvements: was Re: Proposal of tunable fix for scalability of 8.4

2009-03-13 Thread Gregory Stark

A minute ago I said:

 AFAIK Opensolaris doesn't implement posix_fadvise() so there's no benefit. It
 would be great to hear if you could catch the ear of the right people to get
 an implementation committed. Depending on how the i/o scheduler system is
 written it might not even be hard -- the Linux implementation of WILLNEED is
 all of 20 lines.

I noticed after sending it that that's slightly unfair. The 20-line function
calls another function (which calls another function) to do the real readahead
work. That function (mm/readahead.c:__do_page_cache_readahead()) is 48 lines.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-13 Thread Gregory Stark
"Jignesh K. Shah"  writes:

> Gregory Stark wrote:
>> Keep in mind when you do this that it's not interesting to test a number of
>> connections much larger than the number of processors you have. Once the
>> system reaches 100% cpu usage it would be a misconfigured connection pooler
>> that kept more than that number of connections open.
>
> Greg, Unfortuately the problem is that.. I am trying to reach 100% CPU which
> I cannot and hence I am increasing the user count :-)

The effect of increasing the number of users with a connection pooler would be
to decrease the 200ms sleep time to 0.

This is all assuming the idle time is *between* transactions. If you have idle
time in the middle of transactions things become a lot more tricky. I think we
are missing something to deal with that use case.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 8.4 Performance improvements: was Re: Proposal of tunable fix for scalability of 8.4

2009-03-13 Thread Gregory Stark

"Jignesh K. Shah"  writes:

> Can we do a vote on which specific performance features we want to test?
>
> Many of the improvements may not be visible through this standard tests so
> feedback on testing methology for those is also appreciated.
> * Visibility map - Reduce Vacuum overhead - (I think I can time vacuum with
> some usage on both databases)

Timing vacuum is kind of pointless -- the only thing that matters is whether
it's "fast enough". But it is worth saying that good benchmarks should include
normal vacuum runs. Benchmarks which don't run long enough to trigger vacuum
aren't realistic.

> * Prefetch IO with posix_fadvice () - Though I am not sure if it is supported
> on UNIX or not (but can be tested by standard tests)

Well clearly this is my favourite :)

AFAIK Opensolaris doesn't implement posix_fadvise() so there's no benefit. It
would be great to hear if you could catch the ear of the right people to get
an implementation committed. Depending on how the i/o scheduler system is
written it might not even be hard -- the Linux implementation of WILLNEED is
all of 20 lines.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-13 Thread Gregory Stark
"Jignesh K. Shah"  writes:

> Scott Carey wrote:
>> On 3/12/09 11:37 AM, "Jignesh K. Shah"  wrote:
>>
>> In general, I suggest that it is useful to run tests with a few different
>> types of pacing. Zero delay pacing will not have realistic number of
>> connections, but will expose bottlenecks that are universal, and less
>> controversial
>
> I think I have done that before so I can do that again by running the users at
> 0 think time which will represent a "Connection pool" which is highly 
> utilized"
> and test how big the connection pool can be before the throughput tanks.. This
> can be useful for App Servers which sets up connections pools of their own
> talking with PostgreSQL.

A minute ago I said:

  Keep in mind when you do this that it's not interesting to test a number of
  connections much larger than the number of processors you have. Once the
  system reaches 100% cpu usage it would be a misconfigured connection pooler
  that kept more than that number of connections open.

Let me give another reason to call this misconfigured: Postgres connections
are heavyweight and it's wasteful to keep them around but idle. This has a lot
in common with the issue with non-persistent connections where each connection
is used for only a short amount of time.

In Postgres each connection requires a process, which limits scalability on a
lot of operating systems already. On many operating systems having thousands
of processes in itself would create a lot of issues.

Each connection then allocates memory locally for things like temporary table
buffers, sorting, hash tables, etc. On most operating systems this memory is
not freed back to the system when it hasn't been used recently. (Worse, it's
more likely to be paged out and have to be paged in from disk even if it
contains only garbage we intend to overwrite!).

As a result, having thousands of processes --aside from any contention-- would
lead to inefficient use of system resources. Consider for example that if your
connections are using 1MB each then a thousand of them are using 1GB of RAM.
When only 64MB are actually useful at any time. I bet that 64MB would fit
entirely in your processor caches you weren't jumping around in the gigabyte
of local memory your thousands of processes' have allocated.

Consider also that you're limited to setting relatively small settings of
work_mem for fear all your connections might happen to start a sort
simultaneously. So (in a real system running arbitrary queries) instead of a
single quicksort in RAM you'll often be doing unnecessary on-disk merge sorts
using unnecessarily small merge heaps while gigabytes of RAM either go wasted
to cover a rare occurrence or are being used to hold other sorts which have
been started but context-switched away.

To engineer a system intended to handle thousands of simultaneous connections
you would want each backend to use the most light-weight primitives such as
threads, and to hold the least possible state in local memory. That would look
like quite a different system. The locking contention is the least of the
issues we would want to deal with to get there.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-13 Thread Gregory Stark
"Jignesh K. Shah"  writes:

> Scott Carey wrote:
>> On 3/12/09 11:37 AM, "Jignesh K. Shah"  wrote:
>>
>> In general, I suggest that it is useful to run tests with a few different
>> types of pacing. Zero delay pacing will not have realistic number of
>> connections, but will expose bottlenecks that are universal, and less
>> controversial
>
> I think I have done that before so I can do that again by running the users at
> 0 think time which will represent a "Connection pool" which is highly 
> utilized"
> and test how big the connection pool can be before the throughput tanks.. This
> can be useful for App Servers which sets up connections pools of their own
> talking with PostgreSQL.

Keep in mind when you do this that it's not interesting to test a number of
connections much larger than the number of processors you have. Once the
system reaches 100% cpu usage it would be a misconfigured connection pooler
that kept more than that number of connections open.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Proposal of tunable fix for scalability of 8.4

2009-03-12 Thread Gregory Stark
Grzegorz Jaśkiewicz  writes:

> So please, don't say that this doesn't make sense because he tested it
> against ram disc. That was precisely the point of exercise.

What people are tip-toeing around saying, which I'll just say right out in the
most provocative way, is that Jignesh has simply *misconfigured* the system.
He's contrived to artificially create a lot of unnecessary contention.
Optimizing the system to reduce the cost of that artificial contention at the
expense of a properly configured system would be a bad idea.

It's misconfigured because there are more runnable threads than there are
cpus. A lot more. 15 times as many as necessary. If users couldn't run
connection poolers on their own the right approach for us to address this
contention would be to build one into Postgres, not to re-engineer the
internals around the misuse.

Ram-resident use cases are entirely valid and worth testing, but in those use
cases you would want to have about as many processes as you have processes.

The use case where having larger number of connections than processors makes
sense is when they're blocked on disk i/o (or network i/o or whatever else
other than cpu).

And having it be configurable doesn't mean that it has no cost. Having a test
of a user-settable dynamic variable in the middle of a low-level routine could
very well have some cost. Just the extra code would have some cost in reduced
cache efficiency. It could be that loop prediction and so on save us but that
remains to be proven.

And as always the question would be whether the code designed for this
misconfigured setup is worth the maintenance effort if it's not helping
properly configured setups. Consider for example any work with dtrace to
optimize locks under properly configured setups would lead us to make changes
which would have to be tested twice, once with and once without this option.
What do we do if dtrace says some unrelated change helps systems with this
option disabled but hurts systems with it enabled?



-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL block size for SSD RAID setup?

2009-02-25 Thread Gregory Stark
henk de wit  writes:

> Hi,
> I was reading a benchmark that sets out block sizes against raw IO performance
> for a number of different RAID configurations involving high end SSDs (the
> Mtron 7535) on a powerful RAID controller (the Areca 1680IX with 4GB RAM). See
> http://jdevelopment.nl/hardware/one-dvd-per-second/

You might also be interested in:

http://thunk.org/tytso/blog/2009/02/20/aligning-filesystems-to-an-ssds-erase-block-size/

http://thunk.org/tytso/blog/2009/02/22/should-filesystems-be-optimized-for-ssds/

It seems you have to do more work than just look at the application. You want
the application, the filesystem, the partition layout, and the raid device
geometry to all consistently maintain alignment with erase blocks.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Abnormal performance difference between Postgres and MySQL

2009-02-23 Thread Gregory Stark
Farhan Husain  writes:

> I can provide any other information needed and also the data if anyone
> wants.

What did the query plans look like in both databases?

In Postgres you can get the query plan with

EXPLAIN ANALYZE select ...

You can leave out the ANALYZE if you can't wait until the query completes but
it will have much less information to diagnosis any problems.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] TCP network cost

2009-02-18 Thread Gregory Stark
"Ross J. Reedstrom"  writes:

> On Tue, Feb 17, 2009 at 12:20:02AM -0700, Rusty Conover wrote:
>> 
>> Try running tests with ttcp to eliminate any PostgreSQL overhead and  
>> find out the real bandwidth between the two machines.  If its results  
>> are also slow, you know the problem is TCP related and not PostgreSQL  
>> related.
>
> I did in fact run a simple netcat client/server pair and verified that I
> can transfer that file on 0.12 sec localhost (or hostname), 0.35 over the
> net, so TCP stack and network are not to blame. This is purely inside
> the postgresql code issue, I believe.

There's not much Postgres can do to mess up TCP/IP. The only things that come
to mind are a) making lots of short-lived connections and b) getting caught by
Nagle when doing lots of short operations and blocking waiting on results.

What libpq (or other interface) operations are you doing exactly?

[also, your Mail-Followup-To has a bogus email address in it. Please don't do
that]

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query composite index range in an efficient way

2009-02-17 Thread Gregory Stark

Havasvölgyi Ottó  writes:

> I also tried Row constructors with a Between expression, but in this case
> Postgres handled the elements of the row independently, and this led to
> false query result.

What version of Postgres is this? row constructors were fixed a long time ago
to not do that and the main benefit of that was precisely that this type of
expression could use a multi-column index effectively.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] suggestions for postgresql setup on Dell 2950 , PERC6i controller

2009-02-16 Thread Gregory Stark
Arjen van der Meijden  writes:

> When we purchased our Perc 5/e with MD1000 filled with 15 15k rpm sas disks, 
> my
> colleague actually spend some time benchmarking the PERC and a ICP Vortex
> (basically a overclocked Adaptec) on those drives. Unfortunately he doesn't
> have too many comparable results, but it basically boiled down to quite good
> scores for the PERC and a bit less for the ICP Vortex.
> IOMeter sequential reads are above 300MB/s for the RAID5 and above 240MB/s for
> a RAID10 (and winbench99 versions range from 400+ to 600+MB/s). 

FWIW those are pretty terrible numbers for fifteen 15k rpm drives. They're
about what you would expect if for a PCI-X card which was bus bandwidth
limited. A PCI-e card should be able to get about 3x that from the drives.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] suggestions for postgresql setup on Dell 2950 , PERC6i controller

2009-02-05 Thread Gregory Stark
Scott Marlowe  writes:

> We purhcased the Perc 5E, which dell wanted $728 for last fall with 8
> SATA disks in an MD-1000 and the performance is just terrible.  No
> matter what we do the best throughput on any RAID setup was about 30
> megs/second write and 60 Megs/second read.  

Is that sequential or a mix of random and sequential (It's too high to be
purely random i/o)? A single consumer drive should be able to beat those
numbers on sequential i/o. If it's a mix of random and sequential then
performance will obviously depend on the mix.

> I can get that from a mirror set of the same drives under linux kernel
> software RAID.

Why is that surprising? I would expect software raid to be able to handle 8
drives perfectly well assuming you had an controller and bus you aren't
saturating.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Deleting millions of rows

2009-02-04 Thread Gregory Stark
Robert Haas  writes:

> That's good if you're deleting most or all of the parent table, but
> what if you're deleting 100,000 values from a 10,000,000 row table?
> In that case maybe I'm better off inserting all of the deleted keys
> into a side table and doing a merge or hash join between the side
> table and the child table...

It would be neat if we could feed the queued trigger tests into a plan node
like a Materialize and use the planner to determine which type of plan to
generate.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sort performance

2009-01-29 Thread Gregory Stark
Robert Haas  writes:

> On Thu, Jan 29, 2009 at 3:15 PM, Subbiah Stalin-XCGF84
>  wrote:
>>
>> i see the sort operation spilling to disk writing upto 430MB and then
>> return the first 500 rows. Our query is of the sort
>>
>> Now if set the work_mem to 500MB (i did this in a psql session without
>> making it global) and ran the same query. One would think the sort
>> operations would happen in memory and not spill to disk but i still see
>> 430MB written to disk however, the query complete time dropped down to
>> 351Secs. So work_mem did have an impact but wondering why its still writing
>> to disk when it can all do it memory.

The on-disk storage is more compact than the in-memory storage so you actually
need a larger value than the space reported for on-disk storage to avoid the
disk sort entirely. The accounting also isn't perfect; the on-disk sort still
uses some ram, for example.

> What happens if you set work_mem to something REALLY big, like 5GB?

Don't set it larger than the available RAM though -- or you'll quite possibly
get an out-of-error error.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] NOT IN >2hrs vs EXCEPT < 2 sec.

2009-01-29 Thread Gregory Stark
Kevin Traster  writes:

> Regarding the previous posts about the same issues of PERFORMENCE between
> NOT IN versus EXCEPT. There has not been any answer to explain it - just
> talk about the differenences between the two results.
>
> Yes, I can still get the results using EXCEPT but it would be nice to no why
> I can't get NOT IN to complete the simple query.


There are two answers here. One you've already been given, that NOT IN has to
handle NULLs specially and that makes these plans not equivalent. The NOT IN
is decidedly harder to solve.

The other answer is that EXCEPT is a set operation which in Postgres uses a
completely different set of logic. Even if you used NOT EXISTS which really is
equivalent to EXCEPT the resulting plans would be different. Which one would
be better would depend on the circumstances. In an ideal world every
equivalent query would generate identical plans. We don't live in an ideal
world and Postgres isn't perfect.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SSD performance

2009-01-25 Thread Gregory Stark
da...@lang.hm writes:

> they currently have it do a backup immediatly on power loss (which is a safe
> choice as the contents won't be changing without power), but it then powers 
> off
> (which is not good for startup time afterwords)

So if you have a situation where it's power cycling rapidly each iteration
drains the battery of the time it takes to save the state but only charges it
for the time the power is on. I wonder how many iterations that gives you.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] slow query

2009-01-12 Thread Gregory Stark

"Scott Marlowe"  writes:

>  ->  Index Scan using users_orgid_idx on users u  (cost=0.00..129.52 
> rows=5 width=271) (actual time=843.825..860.638 rows=0 loops=35)
>Index Cond: (u.orgid = j2.orgid)
>Filter: ((u.type_id < 10) AND (u.deleted = 0) AND 
> ((lower((u.lname)::text) ~~ 'boat%'::text) OR (lower((u.fname)::text) ~~ 
> 'boat%'::text) OR (lower((u.username)::text) ~~ 'boat%'::text) OR 
> (lower(u.option1) ~~ 'boat%'::text) OR (lower((u.email)::text) ~~ 
> '%boat%'::text) OR (lower(u.external_id) = 'boat'::text)))

Not sure if this is what's going on but I find the high startup time for this
index scan suspicious. Either there are a lot of dead tuples (which would
explain the second run being fast if it marks them all as lp_dead) or there
are a lot of matching index pointers which fail those other constraints.
Assuming it's the latter perhaps some other index definition would let it zero
in on the right tuples more quickly instead of having to grovel through a lot
of irrelevant rows?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] understanding postgres issues/bottlenecks

2009-01-10 Thread Gregory Stark
Ron  writes:

> At 10:36 AM 1/10/2009, Gregory Stark wrote:
>>
>> Or a system crash. If the kernel panics for any reason when it has dirty
>> buffers in memory the database will need to be restored.
>
> A power conditioning UPS should prevent a building wide or circuit level bad
> power event

Except of course those caused *by* a faulty UPS. Or for that matter by the
power supply in the computer or drive array, or someone just accidentally
hitting the wrong power button.

I'm surprised people are so confident in their kernels though. I know some
computers with uptimes measured in years but I know far more which don't.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] understanding postgres issues/bottlenecks

2009-01-10 Thread Gregory Stark
da...@lang.hm writes:

> On Sat, 10 Jan 2009, Markus Wanner wrote:
>
>> My understanding of SSDs so far is, that they are not that bad at
>> writing *on average*, but to perform wear-leveling, they sometimes have
>> to shuffle around multiple blocks at once. So there are pretty awful
>> spikes for writing latency (IIRC more than 100ms has been measured on
>> cheaper disks).

That would be fascinating. And frightening. A lot of people have been
recommending these for WAL disks and this would be make them actually *worse*
than regular drives.

> well, I have one of those cheap disks.
>
> brand new out of the box, format the 32G drive, then copy large files to it
> (~1G per file). this should do almost no wear-leveling, but it's write
> performance is still poor and it has occasional 1 second pauses.

This isn't similar to the way WAL behaves though. What you're testing is the
behaviour when the bandwidth to the SSD is saturated. At that point some point
in the stack, whether in the SSD, the USB hardware or driver, or OS buffer
cache can start to queue up writes. The stalls you see could be the behaviour
when that queue fills up and it needs to push back to higher layers.

To simulate WAL you want to transfer smaller volumes of data, well below the
bandwidth limit of the drive, fsync the data, then pause a bit repeat. Time
each fsync and see whether the time they take is proportional to the amount of
data written in the meantime or whether they randomly spike upwards.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] understanding postgres issues/bottlenecks

2009-01-10 Thread Gregory Stark
"Scott Marlowe"  writes:

> On Sat, Jan 10, 2009 at 5:40 AM, Ron  wrote:
>> At 03:28 PM 1/8/2009, Merlin Moncure wrote:
>>> just be aware of the danger .  hard reset (power off) class of failure
>>> when fsync = off means you are loading from backups.
>>
>> That's what redundant power conditioning UPS's are supposed to help prevent
>> ;-)
>
> But of course, they can't prevent them, but only reduce the likelihood
> of their occurrance.  Everyone who's working in large hosting
> environments has at least one horror story to tell about a power
> outage that never should have happened.

Or a system crash. If the kernel panics for any reason when it has dirty
buffers in memory the database will need to be restored.

>> ...and of course, those lucky few with bigger budgets can use SSD's and not
>> care what fsync is set to.
>
> Would that prevent any corruption if the writes got out of order
> because of lack of fsync?  Or partial writes?  Or wouldn't fsync still
> need to be turned on to keep the data safe.

I think the idea is that with SSDs or a RAID with a battery backed cache you
can leave fsync on and not have any significant performance hit since the seek
times are very fast for SSD. They have limited bandwidth but bandwidth to the
WAL is rarely an issue -- just latency.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor plan choice in prepared statement

2008-12-30 Thread Gregory Stark
da...@lang.hm writes:

> since there is not a pre-parsed interface for queries, it may make sense to
> setup a way to have the query pre-parsed, but not pre-planned for cases like
> this.

What would be more interesting would be to have plans that take into account
the outlier values and have alternative plans for those values.


One aspect that hasn't been discussed here is whether it's only certain
outlier arguments that cause Postgres to choose the poor plan for you or
whether it always chooses it for all the sets of arguments you actually use.

If it's the former then it's possible you're only being bitten if the first
time you prepare the query happens to have one of these outlier set of
parameters. I forget what version this went in but I think it was before 8.2
that Postgres started using the first set of arguments seen to plan the query.
This is usually an improvement over just guessing but if that first set is
unusual it could lead to strange results.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] dbt-2 tuning results with postgresql-8.3.5

2008-12-22 Thread Gregory Stark
"Mark Wong"  writes:

> Thanks for the input.

In a more constructive vein:

1) autovacuum doesn't seem to be properly tracked. It looks like you're just
   tracking the autovacuum process and not the actual vacuum subprocesses
   which it spawns.

2) The response time graphs would be more informative if you excluded the
   ramp-up portion of the test. As it is there are big spikes at the low end
   but it's not clear whether they're really part of the curve or due to
   ramp-up. This is especially visible in the stock-level graph where it
   throws off the whole y scale.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] dbt-2 tuning results with postgresql-8.3.5

2008-12-22 Thread Gregory Stark
"Mark Wong"  writes:

>> I'm not sure how bad that is for the benchmarks. The only effect that comes 
>> to
>> mind is that it might exaggerate the effects of some i/o intensive operations
>> that under normal conditions might not cause any noticeable impact like wal
>> log file switches or even checkpoints.
>
> I'm not sure I'm following.  

All I'm saying is that the performance characteristics won't be the same when
the service wait times are 1-10 seconds rather than the 20-30ms at which alarm
bells would start to ring on a real production system.

I'm not exactly sure what changes it might make though.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] dbt-2 tuning results with postgresql-8.3.5

2008-12-21 Thread Gregory Stark
Mark Wong  writes:

> On Dec 20, 2008, at 5:33 PM, Gregory Stark wrote:
>
>> "Mark Wong"  writes:
>>
>>> To recap, dbt2 is a fair-use derivative of the TPC-C benchmark.  We
>>> are using a 1000 warehouse database, which amounts to about 100GB of
>>> raw text data.
>>
>> Really? Do you get conforming results with 1,000 warehouses? What's  the 95th
>> percentile response time?
>
> No, the results are not conforming.  You and others have pointed that  out
> already.  The 95th percentile response time are calculated on each  page of 
> the
> previous links.

Where exactly? Maybe I'm blind but I don't see them.

>
> I find your questions a little odd for the input I'm asking for.  Are  you
> under the impression we are trying to publish benchmarking  results?  Perhaps
> this is a simple misunderstanding?

Hm, perhaps. The "conventional" way to run TPC-C is to run it with larger and
larger scale factors until you find out the largest scale factor you can get a
conformant result at. In other words the scale factor is an output, not an
input variable.

You're using TPC-C just as an example workload and looking to see how to
maximize the TPM for a given scale factor. I guess there's nothing wrong with
that as long as everyone realizes it's not a TPC-C benchmark.

Except that if the 95th percentile response times are well above a second I
have to wonder whether the situation reflects an actual production OLTP system
well. It implies there are so many concurrent sessions that any given query is
being context switched out for seconds at a time.

I have to imagine that a real production system would consider the system
overloaded as soon as queries start taking significantly longer than they take
on an unloaded system. People monitor the service wait times and queue depths
for i/o systems closely and having several seconds of wait time is a highly
abnormal situation.

I'm not sure how bad that is for the benchmarks. The only effect that comes to
mind is that it might exaggerate the effects of some i/o intensive operations
that under normal conditions might not cause any noticeable impact like wal
log file switches or even checkpoints.

If you have a good i/o controller it might confuse your results a bit when
you're comparing random and sequential i/o because the controller might be
able to sort requests by physical position better than in a typical oltp
environment where the wait queues are too short to effectively do that.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] dbt-2 tuning results with postgresql-8.3.5

2008-12-20 Thread Gregory Stark
"Mark Wong"  writes:

> To recap, dbt2 is a fair-use derivative of the TPC-C benchmark.  We
> are using a 1000 warehouse database, which amounts to about 100GB of
> raw text data. 

Really? Do you get conforming results with 1,000 warehouses? What's the 95th
percentile response time?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need help with 8.4 Performance Testing

2008-12-09 Thread Gregory Stark

Matthew Wakeling <[EMAIL PROTECTED]> writes:

> On Tue, 9 Dec 2008, Scott Marlowe wrote:
>> I wonder how many hard drives it would take to be CPU bound on random
>> access patterns?  About 40 to 60?  And probably 15k / SAS drives to
>> boot.  Cause that's what we're looking at in the next few years where
>> I work.
>
> There's a problem with that thinking. That is, in order to exercise many
> spindles, you will need to have just as many (if not more) concurrent 
> requests.
> And if you have many concurrent requests, then you can spread them over
> multiple CPUs. So it's more a case of "How many hard drives PER CPU". It also
> becomes a matter of whether Postgres can scale that well.

Well:

$ units
2445 units, 71 prefixes, 33 nonlinear units
You have: 8192 byte/5ms
You want: MB/s
* 1.6384
/ 0.61035156

At 1.6MB/s per drive if find Postgres is cpu-bound doing sequential scans at
1GB/s you'll need about 640 drives to keep one cpu satisfied doing random I/O
-- assuming you have perfect read-ahead and the read-ahead itself doesn't add
cpu overhead. Both of which are false of course, but at least in theory that's
what it'll take.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need help with 8.4 Performance Testing

2008-12-09 Thread Gregory Stark
Scott Carey <[EMAIL PROTECTED]> writes:

> And as far as I can tell, even after the 8.4 fadvise patch, all I/O is in
> block_size chunks. (hopefully I am wrong) 
>...
> In addition to the fadvise patch, postgres needs to merge adjacent I/O's
> into larger ones to reduce the overhead. It only really needs to merge up to
> sizes of about 128k or 256k, and gain a 8x to 16x drop in syscall overhead,
> and additionally potentially save code trips down the shared buffer
> management code paths. At lest, thats my guess I haven't looked at any code
> and could be wrong.

There are a lot of assumptions here that I would be interested in seeing
experiments to back up.

FWIW when I was doing testing of posix_fadvise I did a *lot* of experiments
though only with a couple systems. One had a 3-drive array and one with a
15-drive array, both running Linux. I sometimes could speed up the sequential
scan by about 10% but not consistently. It was never more than about 15% shy
of the highest throughput from dd. And incidentally the throughput from dd
didn't seem to depend much at all on the blocksize.

On your system does "dd bs=8k" and "dd bs=128k" really have an 8x performance
difference?

In short, at least from the evidence available, this all seems like it might
be holdover beliefs from the olden days of sysadmining where syscalls were
much slower and OS filesystem caches much dumber.

I'm still interested in looking into it but I'll have to see actual vmstat or
iostat output while it's happening, preferably some oprofile results too. And
how many drives do you actually need to get into this situation. Also, what is
the output of "vacuum verbose" on the table?


> Additionally, the "If your operating system has any reasonable caching
> itself" comment earlier in this conversation --- Linux (2.6.18, Centos 5.2)
> does NOT. I can easily make it spend 100% CPU in system time trying to
> figure out what to do with the system cache for an hour. Just do large
> seqscans with memory pressure from work_mem or other forces that the OS will
> not deem 'idle'. Once the requested memory is ~75% of the system total, it
> will freak out. Linux simply will not give up that last 25% or so of the RAM
> for anything but page cache

This seems like just a misconfigured system. Linux and most Unixen definitely
expect to have a substantial portion of RAM dedicated to disk cache. Keep in
mind all your executable pages count towards this page cache too. You can
adjust this to some extent with the "swappiness" variable in Linux -- but I
doubt you'll be happy with the results regardless.

> The other way around (small shared_buffers, let the OS do it) hurts
> performance overall quite a bit -- randomly accessed pages get pushed out to
> the OS cache more often, and the OS tosses thouse out when a big seqscan
> occurs, resulting in a lot more random access from disk and more disk bound
> periods of time. Great wonder, this operating system caching, eh?

How do you observe this?


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need help with 8.4 Performance Testing

2008-12-09 Thread Gregory Stark

Greg Smith <[EMAIL PROTECTED]> writes:

> On Mon, 8 Dec 2008, Merlin Moncure wrote:
>
>> I wonder if shared_buffers has any effect on how far you can go before
>> you hit the 'tipping point'.
>
> If your operating system has any reasonable caching itself, not so much at
> first.  As long as the index on the account table fits in shared_buffers, even
> the basic sort of caching logic an OS uses is perfectly functional for 
> swapping
> the individual pages of the account table in and out, the main limiting factor
> on pgbench performance.

I would expect higher shared_buffers to raise the curve before the first
breakpoint but after the first breakpoint make the drop steeper and deeper.
The equilibrium where the curve becomes flatter should be lower.

That is, as long as the database fits entirely in RAM having more of the
buffers be immediately in shared buffers is better. Once there's contention
for the precious cache stealing some of it for duplicated buffers will only
hurt.

> There is a further out tipping point I've theorized about but not really
> explored:  the point where even the database indexes stop fitting in memory
> usefully.  As you get closer to that, I'd expect that the clock sweep 
> algorithm
> used by shared_buffers should make it a bit more likely that those important
> blocks would hang around usefully if you put them there, rather than giving
> most of the memory to the OS to manage.

Hm, sounds logical. At that point the slow drop-off should become even
shallower and possibly become completely flat. Greater shared_buffers might
start helping again at that point.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Partition table query performance

2008-11-26 Thread Gregory Stark
"Greg Jaman" <[EMAIL PROTECTED]> writes:

> I have a problem with partitioning and I'm wondering if anyone can provide
> some insight.   I'm trying to find the max value of a column across multiple
> partitions.  The query against the partition set is quite slow while queries
> against child partitions is very fast!

I'm afraid this is a known problematic use case of Postgres's current
partitioning support. Postgres is not capable of finding the plan which you're
undoubtedly looking for where it uses the same plan as your child table query
iterating over the partitions.

There are several groups working to improve this in different ways but none of
them appear to be on track to be in 8.4 so it will be 8.5 or later before they
appear. Sorry.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] seq scan over 3.3 million rows instead of single key index access

2008-11-22 Thread Gregory Stark
"Andrus" <[EMAIL PROTECTED]> writes:

> There are  indexes on   rid(dokumnr) and dok(dokumnr) and dokumnr is int.
> Instead of using single key index, 8.1.4 scans over whole rid table.
> Sometimes idtelluued can contain more than single row so replacing join with
> equality is not possible.
>
> How to fix ?

Firstly the current 8.1 release is 8.1.15. Any of the bugs fixed in those 11
releases might be related to this.

Secondly:

> CREATE TEMP TABLE idtellUued(dokumnr INT) ON COMMIT DROP;
> INSERT INTO idtellUued VALUES(1249228);
> explain analyze  select 1
>   from dok JOIN rid USING(dokumnr)
> JOIN idtellUued USING(dokumnr)
>
> "  ->  Seq Scan on idtelluued  (cost=0.00..31.40 rows=2140 
> width=4)
> (actual time=0.006..0.011 rows=1 loops=1)"

The planner thinks there are 2,140 rows in that temporary table so I don't
believe this is from the example posted. I would suggest running ANALYZE
idtellUued at some point before the problematic query.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query planner cost estimate less than the sum of its parts?

2008-11-05 Thread Gregory Stark

"Scott Carey" <[EMAIL PROTECTED]> writes:

> Certainly, a cost estimate that is ... LESS than one of the sub sections of
> the query is wrong.   This was one hell of a broken query, but it at least
> should have taken an approach that was not a nested loop, and I'm curious if
> that choice was due to a bad estimate here.
>
> Nested Loop IN Join  (cost=0.00..3850831.86 rows=128266 width=8)

Because it's an IN join it doesn't have to run the inner join to completion.
Once it finds a match it can return the outer tuple and continue to the next
outer tuple.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index usage problem on 8.3.3

2008-10-31 Thread Gregory Stark
Tom Lane <[EMAIL PROTECTED]> writes:

> Jeff Frost <[EMAIL PROTECTED]> writes:
>> Tom Lane wrote:
>>> Huh.  That does sound like it's a version-to-version difference.
>>> There's nothing in the CVS log that seems related though.  Are you
>>> willing to post your test case?
>>> 
>> It's a customer DB, so I'll contact them and see if we can boil it down
>> to a test case with no sensitive data.
>
> Well, if there was a change it seems to have been in the right direction
> ;-) so this is mostly just idle curiosity.  Don't jump through hoops to
> get a test case.

Assuming it's not a bug...

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SAN and full_page_writes

2008-09-08 Thread Gregory Stark
"Nikolas Everett" <[EMAIL PROTECTED]> writes:

> Sounds kind of scary.  I think postgres forces the underlying OS and file
> system to do that stuff (sans the mutli-host magic) using fsync.  Is that
> right?

Yes, so you have to make sure that your filesystem really does support fsync
properly. I think most NFS implementations do that.

I was more concerned with:

Network Appliance supports a number of NFS client implementations for use
with databases. These clients provide write atomicity to at least 4 KB,
and support synchronous writes when requested by the database. Typically,
atomicity is guaranteed only to one virtual memory page, which may be as
small as 4 KB. However, if the NFS client supports a direct I/O mode that
completely bypasses the cache, then atomicity is guaranteed to the size
specified by the “wsize” mount option, typically 32 KB.

The failure of some NFS clients to assure write atomicity to a full
database block means that the soft atomicity requirement is not always
met. Some failures of the host system may result in a fractured database
block on disk. In practice such failures are rare. When they happen no
data is lost, but media recovery of the affected database block may be
required

That "media recovery" it's referring to sounds like precisely our WAL full
page writes...


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SAN and full_page_writes

2008-09-08 Thread Gregory Stark

"Nikolas Everett" <[EMAIL PROTECTED]> writes:

> Thanks for pointing that out Bruce.
>
> NetApp has a 6 page PDF about NetApp and databases.  On page 4:

Skimming through this I think all 6 pages are critical. The sentence you quote
out of context pertains specifically to the NAS internal organization.

The previous pages discuss limitations of OSes, filesystems and especially NFS
clients which you may have to be concerned with as well.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] limit clause breaks query planner?

2008-09-04 Thread Gregory Stark
Tom Lane <[EMAIL PROTECTED]> writes:

> Guillaume Cottenceau <[EMAIL PROTECTED]> writes:
>> It seems to me that if the correlation is 0.99[1], and you're
>> looking for less than 1% of rows, the expected rows may be at the
>> beginning or at the end of the heap?
>
> Right, but if you know the value being searched for, you could then
> estimate where it is in the table by consulting the histogram.
>
> Actually, an even easier hack (which would have the nice property of not
> needing to know the exact value being searched for), would simply use
> the existing cost estimates if the WHERE variables have low correlation
> (meaning the random-locations assumption is probably good), but apply
> some sort of penalty factor if the correlation is high.  

Fwiw this will have all the same problems our existing uses of the correlation
have. That doesn't mean we shouldn't do it but I would expect it to be
improved along with the other uses when we find a better metric.

I did happen to speak to a statistician the other day and was given some terms
to google. I'll investigate and see if I get anything useful.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] limit clause breaks query planner?

2008-09-02 Thread Gregory Stark


>>>> "Limit  (cost=0.00..3.68 rows=15 width=128) (actual 
>>>> time=85837.043..85896.140 rows=15 loops=1)"
>>>> "  ->  Seq Scan on my_table this_  (cost=0.00..258789.88 rows=1055580 
>>>> width=128) (actual time=85837.038..85896.091 rows=15 loops=1)"
>>>> "Filter: ((A IS NULL) AND ((B)::text = '21'::text))"
>>>> "Total runtime: 85896.214 ms"

Postgres does collect and use statistics about what fraction of the "A" column
is null. It also collects and uses statistics about what fraction of the "B"
column is 21 (using a histogram). And it does take the LIMIT into account.

I think the other poster might well be right about this table being extremely
bloated. You could test by running and posting the results of:

VACUUM VERBOSE my_table

What it doesn't collect is where in the table those records are -- so if there
are a lot of them then it might use a sequential scan regardless of whether
they're at the beginning or end of the table. That seems unlikely to be the
problem though.

The other thing it doesn't collect is how many of the B=21 records have null
As. So if a large percentage of the table has A as null then it will assume
that's true for the B=21 records and if there are a lot of B=21 records then
it will assume a sequential scan will find matches quickly. If in fact the two
columns are highly correlated and B=21 records almost never have A null
whereas records with other values of B have lots of null values then Postgres
might make a bad decision here.

Also, it only has the statitics for B=21 via a histogram. If the distribution
of B is highly skewed so that, for example values between 20 and 25 are very
common but B=21 happens to be quite rare then Postgres might get a bad
estimate here. You could improve this by raising the statistics target for the
B column and re-analyzing.

That brings up another question -- when was the last time this table was
analyzed?

What estimates and actual results does postgres get for simple queries like:

EXPLAIN ANALYZE SELECT count(*) FROM my_table WHERE A IS NULL;
EXPLAIN ANALYZE SELECT count(*) FROM my_table WHERE B=21;
EXPLAIN ANALYZE SELECT count(*) FROM my_table WHERE A IS NULL AND B=21;

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] slow update of index during insert/copy

2008-09-01 Thread Gregory Stark
"Scott Carey" <[EMAIL PROTECTED]> writes:

> On Raid Controllers and Dev machines:
>
> For a dev machine the battery backup is NOT needed.
>
> Battery back up makes a _production_ system faster:  In production, data
> integrity is everything, and write-back caching is dangerous without a
> battery back up.
>
> So:
> Without BBU:   Write-through cache = data safe in power failure; Write back
> cache = not safe in power failure.
> With BBU :   Both modes are safe on power loss.

This could be read the wrong way. With a BBU it's not that you can run the
drives in write-back mode safely. It's that you can cache in the BBU safely.
The drives still need to have their write caches off (ie, in write-through
mode).

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Big delete on big table... now what?

2008-08-22 Thread Gregory Stark
"Bill Moran" <[EMAIL PROTECTED]> writes:

> "Fernando Hevia" <[EMAIL PROTECTED]> wrote:
>>
>> Hi list.
>>  
>> I have a table with over 30 million rows. Performance was dropping steadily
>> so I moved old data not needed online to an historic table. Now the table
>> has about 14 million rows. I don't need the disk space returned to the OS
>> but I do need to improve performance. Will a plain vacuum do or is a vacuum
>> full necessary?
>> ¿Would a vacuum full improve performance at all?
>
> If you can afford the downtime on that table, cluster would be best.
>
> If not, do the normal vacuum and analyze.  This is unlikely to improve
> the performance much (although it may shrink the table _some_) but
> regular vacuum will keep performance from getting any worse.

Note that CLUSTER requires enough space to store the new and the old copies of
the table simultaneously. That's the main reason for VACUUM FULL to still
exist.

There is also the option of doing something like (assuming id is already an
integer -- ie this doesn't actually change the data):

 ALTER TABLE x ALTER id TYPE integer USING id;

which will rewrite the whole table. This is effectively the same as CLUSTER
except it doesn't order the table according to an index. It will still require
enough space to hold two copies of the table but it will be significantly
faster.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres not using array

2008-08-22 Thread Gregory Stark
André Volpato <[EMAIL PROTECTED]> writes:

> Tom Lane escreveu:
>>> We are guessing that a dual core 3.0GHz will beat up a quad core 2.2,
>>> at least in this environmnent with less than 4 concurrent queryes.
>>
>> The most you could hope for from that is less than a 50% speedup.  I'd
>> suggest investing some tuning effort first.  Some rethinking of your
>> schema, for example, might buy you orders of magnitude ... with no new
>> hardware investment.
>
> I think we almost reached the tuning limit, without changing the schema.

It's hard to tell from the plan you posted (and with only a brief look) but it
looks to me like your query with that function is basically doing a join but
because the inner side of the join is in your function's index lookup it's
effectively forcing the use of a "nested loop" join. That's usually a good
choice for small queries against big tables but if you're joining a lot of
data there are other join types which are much faster. You might find the
planner can do a better job if you write your query as a plain SQL query and
let the optimizer figure out the best way instead of forcing its hand.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-17 Thread Gregory Stark
"Gregory Stark" <[EMAIL PROTECTED]> writes:

> <[EMAIL PROTECTED]> writes:
>
>>> If you are completely over-writing an entire stripe, there's no reason to
>>> read the existing data; you would just calculate the parity information from
>>> the new data. Any good controller should take that approach.
>>
>> in theory yes, in practice the OS writes usually aren't that large and 
>> aligned,
>> and as a result most raid controllers (and software) don't have the
>> special-case code to deal with it.
>
> I'm pretty sure all half-decent controllers and software do actually. This is
> one major reason that large (hopefully battery backed) caches help RAID-5
> disproportionately. The larger the cache the more likely it'll be able to wait
> until the entire raid stripe is replaced avoid having to read in the old
> parity.

Or now that I think about it, replace two or more blocks from the same set of
parity bits. It only has to recalculate the parity bits once for all those
blocks instead of for every single block write.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-17 Thread Gregory Stark
<[EMAIL PROTECTED]> writes:

>> If you are completely over-writing an entire stripe, there's no reason to
>> read the existing data; you would just calculate the parity information from
>> the new data. Any good controller should take that approach.
>
> in theory yes, in practice the OS writes usually aren't that large and 
> aligned,
> and as a result most raid controllers (and software) don't have the
> special-case code to deal with it.

I'm pretty sure all half-decent controllers and software do actually. This is
one major reason that large (hopefully battery backed) caches help RAID-5
disproportionately. The larger the cache the more likely it'll be able to wait
until the entire raid stripe is replaced avoid having to read in the old
parity.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Incorrect estimates on correlated filters

2008-08-14 Thread Gregory Stark
"Craig Ringer" <[EMAIL PROTECTED]> writes:

> It strikes me that there are really two types of query hint possible here.
>
> One tells the planner (eg) "prefer a merge join here".
>
> The other gives the planner more information that it might not otherwise
> have to work with, so it can improve its decisions. "The values used in
> this join condition are highly correlated".

This sounds familiar:

http://article.gmane.org/gmane.comp.db.postgresql.devel.general/55730/match=hints

Plus ça change...

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Using PK value as a String

2008-08-12 Thread Gregory Stark
"Mark Mielke" <[EMAIL PROTECTED]> writes:

>- Increased keyspace. Even if keyspace allocation is performed, an int4 
> only
> has 32-bit of keyspace to allocate. The IPv4 address space is already over 85%
> allocated as an example of how this can happen. 128-bits has a LOT more
> keyspace than 32-bits or 64-bits.

The rest of your points are valid (though not particularly convincing to me
for most applications) but this example is bogus. The IPv4 address space is
congested because of the hierarchic nature of allocations. Not because there
is an actual shortage of IPv4 addresses themselves. There would be enough IPv4
for every ethernet device on the planet for decades to come if we could
allocate them individually -- but we can't.

That is, when allocating an organization 100 addresses if they want to be able
to treat them as a contiguous network they must be allocated 128 addresses.
And if they might ever grow to 129 they're better off just justifying 256
addresses today.

That's not an issue for a sequence generated primary key. Arguably it *is* a
problem for UUID which partitions up that 128-bits much the way the original
pre-CIDR IPv4 addressing scheme partitioned up the address. But 128-bits is so
much bigger it avoids running into the issue.

The flip side is that sequence generated keys have to deal with gaps if record
is deleted later. So the relevant question is not whether you plan to have 2
billion users at any single point in the future but rather whether you plan to
ever have had 2 billion users total over your history. I suspect large
networks like Yahoo or Google might be nearing or past that point now even
though they probably only have a few hundred million current users.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Using PK value as a String

2008-08-12 Thread Gregory Stark
"Mario Weilguni" <[EMAIL PROTECTED]> writes:

> UUID is already a surrogate key not a natural key, in no aspect better than a
> numeric key, just taking a lot more space.
>
> So why not use int4/int8?

The main reason to use UUID instead of sequences is if you want to be able to
generate unique values across multiple systems. So, for example, if you want
to be able to send these userids to another system which is taking
registrations from lots of places. Of course that only works if that other
system is already using UUIDs and you're all using good generators.

You only need int8 if you might someday have more than 2 *billion* users...
Probably not an urgent issue.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Using PK value as a String

2008-08-11 Thread Gregory Stark
"Jay" <[EMAIL PROTECTED]> writes:

> I have a table named table_Users:
>
> CREATE TABLE table_Users (
>UserID   character(40)  NOT NULL default '',
>Username   varchar(256)  NOT NULL default '',
>Email  varchar(256) NOT NULL default ''
>etc...
> );
>
> The UserID is a character(40) and is generated using UUID function. We
> started making making other tables and ended up not really using
> UserID, but instead using Username as the unique identifier for the
> other tables. Now, we pass and insert the Username to for discussions,
> wikis, etc, for all the modules we have developed. I was wondering if
> it would be a performance improvement to use the 40 Character UserID
> instead of Username when querying the other tables, or if we should
> change the UserID to a serial value and use that to query the other
> tables. Or just keep the way things are because it doesn't really make
> much a difference.

Username would not be any slower than UserID unless you have a lot of
usernames longer than 40 characters.

However making UserID an integer would be quite a bit more efficient. It would
take 4 bytes instead of as the length of the Username which adds up when it's
in all your other tables... Also internationalized text collations are quite a
bit more expensive than a simple integer comparison.

But the real question here is what's the better design. If you use Username
you'll be cursing if you ever want to provide a facility to allow people to
change their usernames. You may not want such a facility now but one day...

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] query planner not using the correct index

2008-08-07 Thread Gregory Stark
"Joshua Shanks" <[EMAIL PROTECTED]> writes:

> Those 3 values in reality and in the stats account for 98% of the
> rows. actual distinct values are around 350

Measuring n_distinct from a sample is inherently difficult and unreliable.
When 98% of your table falls into those categories it's leaving very few
chances for the sample to find many other distinct values. 

I haven't seen the whole thread, if you haven't tried already you could try
raising the statistics target for these columns -- that's usually necessary
anyways when you have a very skewed distribution like this.

> It seems like the planner would want to get the result set from
> bars.bars_id condition and if it is big using the index on the join to
> avoid the separate sorting, but if it is small (0-5 rows which is our
> normal case) use the primary key index to join and then just quickly
> sort. Is there any reason the planner doesn't do this?

Yeah, Heikki's suggested having a kind of "branch" plan node that knows how
where the break-point is between two plans and can call the appropriate one.
We don't have anything like that yet.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] switchover between index and sequential scans

2008-07-03 Thread Gregory Stark
"Abhijit Menon-Sen" <[EMAIL PROTECTED]> writes:

>  ->  Index Scan using header_fields_message_key on header_fields  
> (cost=0.00..1126.73 rows=325 width=4) (actual time=9.003..12.330 rows=17 
> loops=75)
>Index Cond: (header_fields.message = "outer".message)
>
>  ->  Seq Scan on header_fields  (cost=0.00..85706.78 rows=1811778 
> width=4) (actual time=22.505..29281.553 rows=1812184 loops=1)

It looks to me like it's overestimating the number of rows in the index scan
by 20x and it's overestimating the cost of random accesses by about 100%.
Combined it's overestimating the cost of the index scan by about 40x.

> This machine has only 512MB of RAM, and is running FreeBSD 5.4. It has
> shared_buffers=3072, effective_cache_size=25000, work_mem=sort_mem=2048.
> Changing the last two doesn't seem to have any effect on the plan.

You could try dramatically increasing effective_cache_size to try to convince
it that most of the random accesses are cached. Or you could reach for the
bigger hammer and reduce random_page_cost by about half.

Also, if this box is dedicated you could make use of more than 24M for shared
buffers. Probably something in the region 64M-128M if your database is large
enough to warrant it.

And increase the statistics target on header_fields and re-analyze?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Subquery WHERE IN or WHERE EXISTS faster?

2008-06-29 Thread Gregory Stark
"Ulrich" <[EMAIL PROTECTED]> writes:

> Hi,
> Yes that looks strange. But it is not possible that I have processors in
> "users_processors" which do not appear in "processors", because
> "users_processors" contains foreign keys to "processors".
>
> If I remove the LIMIT 10 OFFSET 1 the line "Sort (cost= rows=11.."
> disappears and the query return 13 correct processors from "processors". 

Oh, er, my bad. That makes perfect sense. The "actual" numbers can be affected
by what records are actually requested. The LIMIT prevents the records beyond
11 from ever being requested even though they exist. 

While the bitmap heap scan has to fetch all the records even though they don't
all get used, the nested loop only fetches the records as requested.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Subquery WHERE IN or WHERE EXISTS faster?

2008-06-28 Thread Gregory Stark
"Ulrich" <[EMAIL PROTECTED]> writes:

> EXPLAIN ANALYZE SELECT speed FROM processors WHERE id IN (SELECT processorid
> FROM users_processors WHERE userid=4040) ORDER BY speed ASC LIMIT 10 OFFSET 1;
>
> Limit  (cost=113.73..113.75 rows=7 width=5) (actual time=0.335..0.340 rows=10 
> loops=1)
>   ->  Sort  (cost=113.73..113.75 rows=8 width=5) (actual time=0.332..0.333 
> rows=11 loops=1)


  ^^

> Sort Key: processors.speed
> Sort Method:  quicksort  Memory: 17kB
> ->  Nested Loop  (cost=47.22..113.61 rows=8 width=5) (actual 
> time=0.171..0.271 rows=13 loops=1)
>   ->  HashAggregate  (cost=47.22..47.30 rows=8 width=4) (actual 
> time=0.148..0.154 rows=13 loops=1)
> ->  Bitmap Heap Scan on users_processors  
> (cost=4.36..47.19 rows=12 width=4) (actual time=0.074..0.117 rows=13 loops=1)


  ^^

> Index Cond: (userid = 4040)
>   ->  Index Scan using processors_pkey on processors 
> (cost=0.00..8.28 rows=1 width=9) (actual time=0.006..0.007 rows=1 loops=13)
> Index Cond: (processors.id = users_processors.processorid)


It looks to me like you have some processors which appear in
"users_processors" but not in "processors". I don't know your data model but
that sounds like broken referential integrity  to me.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] getting estimated cost to agree with actual

2008-06-02 Thread Gregory Stark
"Scott Marlowe" <[EMAIL PROTECTED]> writes:

> On Mon, Jun 2, 2008 at 3:43 PM, Justin <[EMAIL PROTECTED]> wrote:
>>
>> I have noticed that  estimated Cost to do the query is way off from Actual.
>
> Well, they're not measured in the same units.  estimated costs are in
> terms of the cost to sequentially scan a single tuple, while actual
> costs are in milliseconds.

s/tuple/page/

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 2GB or not 2GB

2008-05-31 Thread Gregory Stark
"Josh Berkus" <[EMAIL PROTECTED]> writes:

> Simon,
>
>> There is an optimum for each specific sort.
>
> Well, if the optimum is something other than "as much as we can get", then we 
> still have a pretty serious issue with work_mem, no?

With the sort algorithm. The problem is that the database can't predict the
future and doesn't know how many more records will be arriving and how out of
order they will be.

What appears to be happening is that if you give the tape sort a large amount
of memory it keeps a large heap filling that memory. If that large heap
doesn't actually save any passes and doesn't reduce the number of output tapes
then it's just wasted cpu time to maintain such a large heap. If you have any
clever ideas on how to auto-size the heap based on how many output tapes it
will create or avoid then by all means speak up.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] ProcArrayLock (The Saga continues)

2008-05-29 Thread Gregory Stark
"Jignesh K. Shah" <[EMAIL PROTECTED]> writes:

> Note with this no think time concept, each clients can be about 75% CPU busy
> from what I observed. running it I found the clients scaling up saturates at
> about 60  now (compared to 500 from the original test). The peak throughput 
> was
> at about 50 users (using synchrnous_commit=off)

So to get the maximum throughput on the benchmark with think times you want to
aggregate the clients about 10:1 with a connection pooler or some middleware
layer of some kind, it seems.

It's still interesting to find the choke points for large numbers of
connections. But I think not because it's limiting your benchmark results --
that would be better addressed by using fewer connections -- just for the sake
of knowing where problems loom on the horizon.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 2GB or not 2GB

2008-05-28 Thread Gregory Stark
"Josh Berkus" <[EMAIL PROTECTED]> writes:

> sort_mem: My tests with 8.2 and DBT3 seemed to show that, due to 
> limitations of our tape sort algorithm, allocating over 2GB for a single 
> sort had no benefit.  However, Magnus and others have claimed otherwise.  
> Has this improved in 8.3?

Simon previously pointed out that we have some problems in our tape sort
algorithm with large values of work_mem. If the tape is "large enough" to
generate some number of output tapes then increasing the heap size doesn't buy
us any reduction in the future passes. And managing very large heaps is a
fairly large amount of cpu time itself.

The problem of course is that we never know if it's "large enough". We talked
at one point about having a heuristic where we start the heap relatively small
and double it (adding one row) whenever we find we're starting a new tape. Not
sure how that would work out though.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] "Big O" notation for postgres?

2008-05-22 Thread Gregory Stark
"Richard Huxton" <[EMAIL PROTECTED]> writes:

> Jonah H. Harris wrote:
>> On Wed, May 21, 2008 at 10:10 AM, H. Hall <[EMAIL PROTECTED]> wrote:
>>> Does anyone know if there is a source that provides "Big O" notation for
>>> postgres's aggregate functions and operations?  For example is count(*) =
>>> O(1) or O(n)?
>>
>> I don't know of any document containing the complexity of each
>> aggregate, but it's sometimes left as a comment in the souce code.
>
> Recent max() and min() can be O(n) or O(1) depending on the where-clause and
> presence of an index too, just to muddy the waters.

Hm, true. But excluding those special cases all Postgres aggregate functions
will be O(n) unless they're doing something very odd. None of the built-in
functions (except min/max as noted) do anything odd like that.

The reason way is because of the basic design of Postgres aggregate functions.
They are fed every tuple one by one and have to keep their state in a single
variable. Most of the aggregate functions like count(*) etc just keep a static
non-growing state and the state transition function is a simple arithmetic
operation which is O(1). So the resulting operation is O(n).

Actually one exception would be something like

CREATE AGGREGATE array_agg(anyelement) (SFUNC = array_append, STYPE = anyarray, 
INITCOND='{}');

Since the state variable has to keep accumulating more and more stuff the
array_append becomes more and more expensive (it has to generate a new array
so it has to copy the existing stuff). So actually it woul dbe O(n^2).

The only builtin aggregate which looks like it falls in this category would be
xmlagg()

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] I/O on select count(*)

2008-05-15 Thread Gregory Stark
"Luke Lonergan" <[EMAIL PROTECTED]> writes:

> BTW ­ we¹ve removed HINT bit checking in Greenplum DB and improved the
> visibility caching which was enough to provide performance at the same level
> as with the HINT bit optimization, but avoids this whole ³write the data,
> write it to the log also, then write it again just for good measure²
> behavior.
>
> For people doing data warehousing work like the poster, this Postgres
> behavior is miserable.  It should be fixed for 8.4 for sure (volunteers?)

For people doing data warehousing I would think the trick would be to do
something like what we do to avoid WAL logging for tables created in the same
transaction. 

That is, if you're loading a lot of data at the same time then all of that
data is going to be aborted or committed and that will happen at the same
time. Ideally we would find a way to insert the data with the hint bits
already set to committed and mark the section of the table as being only
provisionally extended so other transactions wouldn't even look at those pages
until the transaction commits.

This is similar to the abortive attempt to have the abovementioned WAL logging
trick insert the records pre-frozen. I recall there were problems with that
idea though but I don't recall if they were insurmountable or just required
more work.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Understanding histograms

2008-04-30 Thread Gregory Stark

"Tom Lane" <[EMAIL PROTECTED]> writes:

> Right.  As a matter of policy we never estimate less than one matching
> row; and I've seriously considered pushing that up to at least two rows
> except when we see that the query condition matches a unique constraint.
> You can get really bad join plans from overly-small estimates.

This is something that needs some serious thought though. In the case of
partitioned tables I've seen someone get badly messed up plans because they
had a couple hundred partitions each of which estimated to return 1 row. In
fact of course they all returned 0 rows except the correct partition. (This
was in a join so no constraint exclusion)

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Replication Syatem

2008-04-29 Thread Gregory Stark
"Pavan Deolasee" <[EMAIL PROTECTED]> writes:

>> Any special guideline to follow to make HOT working??
>>
>
> You can do couple of things to benefit from HOT.
>
> 1. HOT addresses a special, but common case where UPDATE operation
> does not change any of the index keys. So check if your UPDATE changes
> any of the index keys. If so, see if you can avoid having index
> involving that column. Of course, I won't advocate dropping an index
> if it would drastically impact your frequently run queries.
>
> 2. You may leave some free space in the heap (fillfactor less than
> 100). My recommendation would be to leave space worth of one row or
> slightly more than that to let first UPDATE be an HOT update.
> Subsequent UPDATEs in the page may reuse the dead row created by
> earlier UPDATEs.
>
> 3. Avoid any long running transactions.

Perhaps we should put this list in the FAQ.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmarks WAS: Sun Talks about MySQL

2008-04-28 Thread Gregory Stark
"Josh Berkus" <[EMAIL PROTECTED]> writes:

> Greg,
>
>> What I was referring to by "passing" TPC-E was the criteria for a conformant
>> benchmark run. TPC-C has iirc, only two relevant criteria: "95th percentile
>> response time < 5s" and "average response time < 95th percentile response
>> time". You can pass those even if 1 transaction in 20 takes 10-20s which is
>> more than enough to cover checkpoints and other random sources of 
>> inconsistent
>> performance.
>
> We can do this now.  I'm unhappy because we're at about 1/4 of Oracle
> performance, but we certainly pass -- even with 8.2.

We certainly can pass TPC-C. I'm curious what you mean by 1/4 though? On
similar hardware? Or the maximum we can scale to is 1/4 as large as Oracle?
Can you point me to the actual benchmark runs you're referring to?

But I just made an off-hand comment that I doubt 8.2 could pass TPC-E which
has much more stringent requirements. It has requirements like: 

  the throughput computed over any period of one hour, sliding over the Steady
  State by increments of ten minutes, varies from the Reported Throughput by no
  more than 2%


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [pgsql-advocacy] Benchmarks WAS: Sun Talks about MySQL

2008-04-28 Thread Gregory Stark

"Josh Berkus" <[EMAIL PROTECTED]> writes:

>> I think TPC-E will make both of these major improvements much more important.
>> I suspect it would be hard to get 8.2 to even pass TPC-E due to the 
>> checkpoint
>> dropouts.
>
> You'd be surprised, then.  We're still horribly, horribly lock-bound on TPC-E;
> on anything over 4 cores lock resolution chokes us to death.  See Jignesh's 
> and
> Paul's various posts about attempts to fix this.

Most of those posts have been about scalability issues with extremely large
numbers of sessions. Those are interesting too and they may be limiting our
results in benchmarks which depend on such a configuration (which I don't
think includes TPC-E, but the benchmark Jignesh has been writing about is some
Java application benchmark which may be such a beast) but they don't directly
relate to whether we're "passing" TPC-E.

What I was referring to by "passing" TPC-E was the criteria for a conformant
benchmark run. TPC-C has iirc, only two relevant criteria: "95th percentile
response time < 5s" and "average response time < 95th percentile response
time". You can pass those even if 1 transaction in 20 takes 10-20s which is
more than enough to cover checkpoints and other random sources of inconsistent
performance.

TPC-E has more stringent requirements which explicitly require very consistent
response times and I doubt 8.2 would have been able to pass them. So the
performance limiting factors whether they be i/o, cpu, lock contention, or
whatever don't even come into play. We wouldn't have any conformant results
whatsoever, not even low values limited by contention. 8.3 however should be
in a better position to pass.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] off-topic: SPAM

2008-04-24 Thread Gregory Stark
"Albe Laurenz" <[EMAIL PROTECTED]> writes:

>> The email message sent to [EMAIL PROTECTED] ,
>> pgsql-performance@postgresql.org requires a confirmation to be delivered.
>> Please, answer this email informing the characters that you see in the
>> image below.
>
> Could somebody remove the latter address from the list, please?

Unfortunately that's not the address causing the problem. This is a
particularly stupid spam filter which is just grabbing that address from your
To line.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] mysterious difference in speed when combining two queries with OR

2008-04-23 Thread Gregory Stark
"Hans Ekbrand" <[EMAIL PROTECTED]> writes:

> No, I just wanted to show the time differences, I haven't used join
> before. Now that you have adviced me to, I have tried your suggestion
> to rewrite B as a union and it works good! Just as fast as the A Query!

You can even do better. If you know the two sets of mid are disjoint you can
use UNION ALL. If not you could remove the two DISTINCTs as the UNION will
take care of removing duplicates.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Group by more efficient than distinct?

2008-04-18 Thread Gregory Stark
"Francisco Reyes" <[EMAIL PROTECTED]> writes:

> Is there any dissadvantage of using "group by" to obtain a unique list?
>
> On a small dataset the difference was about 20% percent.
>
> Group by
> HashAggregate  (cost=369.61..381.12 rows=1151 width=8) (actual
> time=76.641..85.167 rows=2890 loops=1)

HashAggregate needs to store more values in memory at the same time so it's
not a good plan if you have a lot of distinct values.

But the planner knows that and so as long as your work_mem is set to a
reasonable size (keeping in mind each sort or other operation feels free to
use that much itself even if there are several in the query) and the rows
estimate is reasonable accurate -- here it's mediocre but not dangerously bad
-- then if the planner is picking it it's probably a good idea.

I'm not sure but I think there are cases where the DISTINCT method wins too.
This is basically a bug, in an ideal world both queries would generate
precisely the same plans since they're equivalent. It's just not a high
priority since we have so many more interesting improvements competing for
time.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] shared_buffers performance

2008-04-14 Thread Gregory Stark
"Tom Lane" <[EMAIL PROTECTED]> writes:

> Gregory Stark <[EMAIL PROTECTED]> writes:
>> The transition domain where performance drops dramatically as the database
>> starts to not fit in shared buffers but does still fit in filesystem cache.
>
> It looks to me like the knee comes where the DB no longer fits in
> filesystem cache.  

That does seem to make a lot more sense. I think I misread the units of the
size of the accounts table. Reading it again it seems to be in the 1.5G-2G
range for the transition which with indexes and other tables might be starting
to stress the filesystem cache -- though it still seems a little low.

I think if I squint I can see another dropoff at the very small scaling
numbers. That must be the point where the database is comparable to the shared
buffers size. Except then I would expect the green and blue curves to be
pushed to the right a bit rather than just havin a shallower slope.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] shared_buffers performance

2008-04-14 Thread Gregory Stark
"Gaetano Mendola" <[EMAIL PROTECTED]> writes:

> The following graph reports the results:
>
> http://img84.imageshack.us/my.php?image=totalid7.png

That's a *fascinating* graph.

It seems there are basically three domains. 

The small domain where the database fits in shared buffers -- though actually
this domain seems to hold until the accounts table is about 1G so maybe it's
more that the *indexes* fit in memory. Here larger shared buffers do clearly
win.

The transition domain where performance drops dramatically as the database
starts to not fit in shared buffers but does still fit in filesystem cache.
Here every megabyte stolen from the filesystem cache makes a *huge*
difference. At a scale factor of 120 or so you're talking about a factor of 4
between each of the shared buffer sizes.

The large domain where the database doesn't fit in filesystem cache. Here it
doesn't make a large difference but the more buffers duplicated between
postgres and the filesystem cache the lower the overall cache effectiveness.

If we used something like either mmap or directio to avoid the double
buffering we would be able to squeeze these into a single curve, as well as
push the dropoff slightly to the right. In theory. 

In practice it would depend on the OS's ability to handle page faults
efficiently in the mmap case, and our ability to do read-ahead and cache
management in the directio case. And it would be a huge increase in complexity
for Postgres and a push into a direction which isn't our "core competency". We
might find that while in theory it should perform better our code just can't
keep up with Linux's and it doesn't.

I'm curious about the total database size as a for each of the scaling factors
as well as the total of the index sizes. And how much memory Linux says is
being used for filesystem buffers.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance increase with elevator=deadline

2008-04-11 Thread Gregory Stark
"Albe Laurenz" <[EMAIL PROTECTED]> writes:

> This refers to the performance problem reported in
> http://archives.postgresql.org/pgsql-performance/2008-04/msg00052.php
>
> After some time of trial and error we found that changing the I/O scheduling
> algorithm to "deadline" improved I/O performance by a factor 4 (!) for
> this specific load test.

What was the algorithm before?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Partitioned tables - planner wont use indexes

2008-04-04 Thread Gregory Stark

"kevin kempter" <[EMAIL PROTECTED]> writes:

> that the planner wants to do a sequential scan on each  partition. We do have
> "constraint_elimination = on" set in the  postgresql.conf file.

"constraint_exclusion" btw.


> myDB=# explain SELECT min(logdate) FROM part_master;

Er, yeah. Unfortunately this is just not a kind of query our planner knows how
to optimize when dealing with a partitioned table... yet. There are several
different pieces missing to make this work. There's some hope some of them
might show up for 8.4 but no guarantees.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PG writes a lot to the disk

2008-03-21 Thread Gregory Stark
"Laurent Raufaste" <[EMAIL PROTECTED]> writes:

> All this make me think that PG was setting some bit on every row it
> used, which caused this massive write activity (3MB/s) in the table
> files. I'm talking about approx. 50 SELECT per second for a single
> server.

Well that's true it does. But only once per row. So analyze would have set the
bit on every row. You could do the same thing with something ligter like
"select count(*) from ".

Tom's thinking was that you would only expect a high update rate for a short
time until all those bits were set.

Slony's inserts, updates, and deletes count as updates to the table as well.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL NetApp and NFS

2008-03-21 Thread Gregory Stark
"Dawid Kuroczko" <[EMAIL PROTECTED]> writes:

> It is also possible to present block devices from NetApp over iSCSI or FC
> (I am not sure about licensing model though).  You get all the goodies
> like thin provisioning (only non-zero blocks are allocated), snapshots and
> all, but you see it as a block device.  Works fine.

Note that Postgres doesn't expect to get "out of space" errors on writes
specifically because it pre-allocates blocks. So this "thin provisioning"
thing sounds kind of dangerous.

> It is also worth to mention that NetApp utilizes somewhat "copy on write"
> write strategy -- so whenever you modify a block, new version of the block
> is written on its WAFL filesystem.  In practical terms it is quite resilient 
> to
> random writes (and that read performance is not stellar ;)).

Again, Postgres goes to some effort to keep its reads sequential. So this
sounds like it destroys that feature. Random writes don't matter so much
because Postgres has its WAL which it writes sequentially. Writes to data
files aren't in the critical path and can finish after the transaction is
committed.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What is the best way to storage music files in Postgresql

2008-03-18 Thread Gregory Stark

<[EMAIL PROTECTED]> writes:

> On Tue, 18 Mar 2008, Gregory Stark wrote:
>
>> You can have as many parity drives as you want with RAID 5 too.
>
> you can? I've never seen a raid 5 setup with more then a single parity dirve
> (or even the option of having more then one drives worth of redundancy). you
> can have hot-spare drives, but thats a different thing.
>
> what controller/software lets you do this?

Hm, some research shows I may have completely imagined this. I don't see why
you couldn't but I can't find any evidence that this feature exists. I could
have sworn I've seen it before though.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What is the best way to storage music files in Postgresql

2008-03-18 Thread Gregory Stark
<[EMAIL PROTECTED]> writes:

> On Wed, 19 Mar 2008, Andrej Ricnik-Bay wrote:
>
>>
>> On 18/03/2008, Craig Ringer <[EMAIL PROTECTED]> wrote:
>>> Isn't a 10 or 24 spindle RAID 5 array awfully likely to encounter a
>>>  double disk failure (such as during the load imposed by rebuild onto a
>>>  spare) ?
>
> that's why you should use raid6 (allowing for dual failures)

You can have as many parity drives as you want with RAID 5 too.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] best way to run maintenance script

2008-03-15 Thread Gregory Stark
"Vinubalaji Gopal" <[EMAIL PROTECTED]> writes:

> On Fri, 2008-03-14 at 18:37 -0700, Tom Lane wrote:
>> That's only a little bit better.  Read about all the bug fixes you're
>
> Sure - will eventually upgrade it sometime - but it has to wait for
> now :(

Waiting for one of those bugs to bite you is a bad plan.

We're not talking about an upgrade to 8.1, 8.2, or 8.3. We're talking about
taking bug-fixes and security fixes for the release you're already using.

Normally it's just a shutdown and immediate restart. There are exceptions
listed in the 8.0.6 release notes which would require a REINDEX but they don't
affect most people.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-10 Thread Gregory Stark
"Tom Lane" <[EMAIL PROTECTED]> writes:

> Well, scanning an index to get a count might be significantly faster
> than scanning the main table, but it's hardly "instantaneous".  It's
> still going to take time proportional to the table size.

Hm, Mark's comment about bitmap indexes makes that not entirely true. A bitmap
index can do RLE compression which makes the relationship between the size of
the table and the time taken to scan the index more complex. In the degenerate
case where there are no concurrent updates (assuming you can determine that
quickly) it might actually be constant time.

> Unless they keep a central counter of the number of index entries;
> which would have all the same serialization penalties we've talked
> about before...

Bitmap indexes do in fact have concurrency issues -- arguably they're just a
baroque version of this central counter in this case.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 7 hrs for a pg_restore?

2008-02-20 Thread Gregory Stark
"Chris" <[EMAIL PROTECTED]> writes:

>> When I said "obfuscating" I meant it. I'm pretty familiar with sh scripting
>> and I'm not even sure what the && behaviour would do.
>
> It chains commands together so if the first fails the second doesn't happen.

I meant in this case, not in general. That is, does it introduce a subshell?

Sh traditionally has to introduce to implement some of the logical control and
pipe operators. I'm not sure if a simple && is enough but often it's
surprising how quickly that happens.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] 7 hrs for a pg_restore?

2008-02-20 Thread Gregory Stark
"Tom Lane" <[EMAIL PROTECTED]> writes:

> Erik Jones <[EMAIL PROTECTED]> writes:
>> On Feb 20, 2008, at 8:14 AM, Gregory Stark wrote:
>>> I would suggest leaving out the && which only obfuscate what's  
>>> going on here.
>>> 
>>> PGOPTIONS=... pg_restore ...
>>> 
>>> would work just as well and be clearer about what's going on.
>
>> Right, that's just an unnecessary habit of mine.
>
> Isn't that habit outright wrong?  ISTM that with the && in there,
> what you're doing is equivalent to
>
>   PGOPTIONS=whatever
>   pg_restore ...
>
> This syntax will set PGOPTIONS for the remainder of the shell session,
> causing it to also affect (say) a subsequent psql invocation.  Which is
> exactly not what is wanted.

When I said "obfuscating" I meant it. I'm pretty familiar with sh scripting
and I'm not even sure what the && behaviour would do. On at least some shells
I think the && will introduce a subshell. In that case the variable would not
continue. In bash I think it would because bash avoids a lot of subshells that
would otherwise be necessary. 

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] 7 hrs for a pg_restore?

2008-02-20 Thread Gregory Stark
"Douglas J Hunley" <[EMAIL PROTECTED]> writes:

> On Tuesday 19 February 2008 16:32:02 Erik Jones wrote:
>> pg_restore is a postgres client app that uses libpq to connect and,  
>> thus, will pick up anything in your $PGOPTIONS env variable.  So,
>>
>> PGOPTONS="-c maintenance_work_mem=512MB" && pg_restore 
>
> now that's just plain cool
>
> /me updates our wiki

I would suggest leaving out the && which only obfuscate what's going on here.

PGOPTIONS=... pg_restore ...

would work just as well and be clearer about what's going on.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Optimizing No matching record Queries

2008-02-13 Thread Gregory Stark
"Pallav Kalva" <[EMAIL PROTECTED]> writes:

> This index would fix this problem but in general I would like to know what if
> there are queries where it does "index scan backwards" and there is no "order
> by clause" and the query is still bad ? Would there be a case like that or the
> planner uses index scan backwards only when use order by desc also.

I think you're oversimplifying. Basically you were asking the planner for the
most recent record for a given user. The planner had the choice either of

a) going through all the records for a given user and picking the most recent,

or b) scanning the records from most recent to oldest and looking for the
given user.

It was a choice between two evils. If there are a lot of records for the user
then a) will be bad since it has to scan all of them to find the most recent
and if there are no records for the user then b) will be bad because it'll
have to go through all of the records to the beginning of time.

The suggested index lets it scan the records for the given user from most
recent to oldest without seeing any records for any other user.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Optimizing No matching record Queries

2008-02-12 Thread Gregory Stark
"Stephen Denne" <[EMAIL PROTECTED]> writes:

> Pallav Kalva asked
> ...
>> and listing0_.fklistingsourceid=5525
> ...
>> order by listing0_.entrydate desc limit 10;
>
>>->  Index Scan Backward using idx_listing_entrydate on 
>> listing listing0_  (cost=0.00..781557.28 rows=5118 width=107) (actual 
>> time=2113544.412..2113544.412 rows=0 loops=1)
>>  Filter: (fklistingsourceid = 5525)
>
> Would it help to have a combined index on fklistingsourceid, entrydate?

I think that would help. You already have a ton of indexes, you might consider
whether all your queries start with a listingsourceid and whether you can have
that as a prefix on the existing index.

Another thing to try is raising the stats target on fklistingsourceid and/or
entrydate. The estimate seems pretty poor. It could just be that the
distribution is highly skewed which is a hard case to estimate correctly.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Benchmark Data requested

2008-02-04 Thread Gregory Stark

"Jignesh K. Shah" <[EMAIL PROTECTED]> writes:

> Also very important unless you are running the UPDATE FUNCTIONS which are
> separate queries, all these Q1-Q22 Queries are pure "READ-ONLY" queries.
> Traditionally I think PostgreSQL does lack "READ-SPEED"s specially since it is
> bottlenecked by the size of the reads it does (BLOCKSIZE). Major database
> provides multi-block parameters to do multiple of reads/writes in terms of
> blocksizes to reduce IOPS and also for read only they also have READ-AHEAD or
> prefetch sizes which is generally bigger than multi-block or extent sizes to
> aid reads.

Note that all of these things are necessitated by those databases using direct
i/o of some form or another. The theory is that PostgreSQL doesn't have to
worry about these things because the kernel is taking care of it.

How true that is is a matter of some debate and it varies from OS to OS. But
it's definitely true that the OS will do read-ahead for sequential reads, for
example.

Incidentally we found some cases that Solaris was particularly bad at. Is
there anybody in particular that would be interested in hearing about them?
(Not meant to be a knock on Solaris, I'm sure there are other cases Linux or
BSD handle poorly too)


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Benchmark Data requested

2008-02-04 Thread Gregory Stark

"Jignesh K. Shah" <[EMAIL PROTECTED]> writes:

> Then for the power run that is essentially running one query at a time should
> essentially be able to utilize the full system (specially multi-core systems),
> unfortunately PostgreSQL can use only one core. (Plus since this is read only
> and there is no separate disk reader all other processes are idle) and system
> is running at 1/Nth capacity (where N is the number of cores/threads)

Is the whole benchmark like this or is this just one part of it?

Is the i/o system really able to saturate the cpu though?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] RAID arrays and performance

2008-01-29 Thread Gregory Stark
"Kevin Grittner" <[EMAIL PROTECTED]> writes:

> >>> On Tue, Jan 29, 2008 at  9:52 AM, in message
> <[EMAIL PROTECTED]>, Gregory Stark <[EMAIL PROTECTED]>
> wrote: 
>  
> > I got this from a back-of-the-envelope calculation which now that I'm trying
> > to reproduce it seems to be wrong. Previously I thought it was n(n+1)/2 or
> > about n^2/2. So at 16 I would have expected about 128 pending i/o requests
> > before all the drives could be expected to be busy.
>  
> That seems right to me, based on the probabilities of any new
> request hitting an already-busy drive.
>  
> > Now that I'm working it out more carefully I'm getting that the expected
> > number of pending i/o requests before all drives are busy is
> >  n + n/2 + n/3 + ... + n/n
>  
> What's the basis for that?

Well consider when you've reached n-1 drives; the expected number of requests
before you hit the 1 idle drive remaining out of n would be n requests. When
you're at n-2 the expected number of requests before you hit either of the two
idle drives would be n/2. And so on. The last term of n/n would be the first
i/o when all the drives are idle and you obviously only need one i/o to hit an
idle drive.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] RAID arrays and performance

2008-01-29 Thread Gregory Stark

"Matthew" <[EMAIL PROTECTED]> writes:

> On Tue, 29 Jan 2008, Gregory Stark wrote:
>>> This was with 8192 random requests of size 8192 bytes from an 80GB test 
>>> file.
>>> Unsorted requests ranged from 1.8 MB/s with no prefetching to 28MB/s with 
>>> lots
>>> of prefetching. Sorted requests went from 2.4MB/s to 38MB/s. That's almost
>>> exactly 16x improvement for both, and this is top of the line hardware.
>>
>> Neat. The curves look very similar to mine. I also like that with your
>> hardware the benefit maxes out at pretty much exactly where I had
>> mathematically predicted they would ((stripe size)^2 / 2).
>
> Why would that be the case? Does that mean that we can select a stripe size of
> 100GB and get massive performance improvements? Doesn't seem logical to me. To
> me, it maxes out at 16x speed because there are 16 discs.

Sorry, I meant "number of drives in the array" not number of bytes. So with 16
drives you would need approximately 128 random pending i/o operations to
expect all drives to be busy at all times.

I got this from a back-of-the-envelope calculation which now that I'm trying
to reproduce it seems to be wrong. Previously I thought it was n(n+1)/2 or
about n^2/2. So at 16 I would have expected about 128 pending i/o requests
before all the drives could be expected to be busy.

Now that I'm working it out more carefully I'm getting that the expected
number of pending i/o requests before all drives are busy is
 n + n/2 + n/3 + ... + n/n

which is actually n * H(n) which is approximated closely by n * log(n).

That would predict that 16 drives would actually max out at 44.4 pending i/o
requests. It would predict that my three-drive array would max out well below
that at 7.7 pending i/o requests. Empirically neither result seems to match
reality. Other factors must be dominating.

> Amusingly, there appears to be a spam filter preventing my message (with its
> image) getting through to the performance mailing list.

This has been plaguing us for a while. When we figure out who's misconfigured
system is doing it I expect they'll be banned from the internet for life!

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] RAID arrays and performance

2008-01-29 Thread Gregory Stark
"Matthew" <[EMAIL PROTECTED]> writes:

> On Tue, 4 Dec 2007, Gregory Stark wrote:
>> FWIW I posted some numbers from a synthetic case to pgsql-hackers
>>
>> http://archives.postgresql.org/pgsql-hackers/2007-12/msg00088.php
>...
> This was with 8192 random requests of size 8192 bytes from an 80GB test file.
> Unsorted requests ranged from 1.8 MB/s with no prefetching to 28MB/s with lots
> of prefetching. Sorted requests went from 2.4MB/s to 38MB/s. That's almost
> exactly 16x improvement for both, and this is top of the line hardware. 

Neat. The curves look very similar to mine. I also like that with your
hardware the benefit maxes out at pretty much exactly where I had
mathematically predicted they would ((stripe size)^2 / 2).

> So, this is FYI, and also an added encouragement to implement fadvise
> prefetching in some form or another. How's that going by the way?

I have a patch which implements it for the low hanging fruit of bitmap index
scans. it does it using an extra trip through the buffer manager which is the
least invasive approach but not necessarily the best.

Heikki was pushing me to look at changing the buffer manager to support doing
asynchronous buffer reads. In that scheme you would issue a ReadBufferAsync
which would give you back a pinned buffer which the scan would have to hold
onto and call a new buffer manager call to complete the i/o when it actually
needed the buffer. The ReadBufferAsync call would put the buffer in some form
of i/o in progress.

On systems with only posix_fadvise ReadBufferAsync would issue posix_fadvise
and ReadBufferFinish would issue a regular read. On systems with an effective
libaio the ReadBufferAsync could issue the aio operation and ReadBufferFinish
could then do a aio_return.

The pros of such an approach would be less locking and cpu overhead in the
buffer manager since the second trip would have the buffer handle handy and
just have to issue the read.

The con of such an approach would be the extra shared buffers occupied by
buffers which aren't really needed yet. Also the additional complexity in the
buffer manager with the new i/o in progress state. (I'm not sure but I don't
think we get to reuse the existing i/o in progress state.)

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Performance issues migrating from 743 to 826

2008-01-28 Thread Gregory Stark
"Matthew Lunnon" <[EMAIL PROTECTED]> writes:

> In this case the query takes 6.037 ms to run on 862 and 2.332 to run on 743.

The difference between 2ms and 6ms is pretty negligable. A single context
switch or disk cache miss could throw the results off by that margin in either
direction.

But what plan does 7.4.3 come up with if you set enable_hashjoins = off? I'm
curious whether it comes up with the same nested loops plan as 8.2 and what
cost it says it has.

I think you need to find queries which take longer to have any reliable
performance comparisons. Note that the configuration parameters here aren't
the same at all, it's possible the change of effective_cache_size from 800k to
2GB is what's changing the cost estimation. I seem to recall a change in the
arithmetic for calculatin Nested loop costs too which made it more aggressive
in estimating cache effectiveness.

Incidentally, default_statistics_target=1000 is awfully aggressive. I found in
the past that that caused the statistics table to become much larger and much
slower to access. It may have caused some statistics to be toasted or it may
have just been the sheer volume of data present. It will also make your
ANALYZEs take a lot longer. I would suggest trying 100 first and incrementally
raising it rather than jumping straight to 1000. And preferably only on the
columns which really matter.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] More shared buffers causes lower performances

2007-12-27 Thread Gregory Stark
"Greg Smith" <[EMAIL PROTECTED]> writes:

> The worst time people can run into a performance
> regression is when they're running a popular benchmarking tool.  

Hm, perhaps pg_bench should do a "show debug_assertions" and print a warning
if the answer isn't "off". We could encourage other benchmark software to do
something similar.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] More shared buffers causes lower performances

2007-12-27 Thread Gregory Stark
"Alvaro Herrera" <[EMAIL PROTECTED]> writes:

> Tom Lane escribió:
>
>> Currently the docs say that --enable-cassert
>> 
>>  Enables assertion checks in the server, which test for
>>  many cannot happen conditions.  This is invaluable for
>>  code development purposes, but the tests slow things down a little.
>> 
>> Maybe we ought to put that more strongly --- s/a little/significantly/,
>> perhaps?
>
> I don't think it will make any difference, because people don't read
> configure documentation.  They read configure --help.

Fwiw I think you're all getting a bit caught up in this one context. While the
slowdown is significant when you take out the stopwatch, under normal
interactive use you're not going to notice your queries being especially slow.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] pg_dump performance

2007-12-27 Thread Gregory Stark
"Jared Mauch" <[EMAIL PROTECTED]> writes:

>   pg_dump is utilizing about 13% of the cpu and the
> corresponding postgres backend is at 100% cpu time.
> (multi-core, multi-cpu, lotsa ram, super-fast disk).
>...
>   pg8.3(beta) with the following variances from default
>
> checkpoint_segments = 300# in logfile segments, min 1, 16MB each
> effective_cache_size = 512MB# typically 8KB each
> wal_buffers = 128MB# min 4, 8KB each
> shared_buffers = 128MB# min 16, at least max_connections*2, 8KB 
> each
> work_mem = 512MB # min 64, size in KB

Fwiw those are pretty unusual numbers. Normally work_mem is much smaller than
shared_buffers since you only need one block of memory for shared buffers and
work_mem is for every query (and every sort within those queries). If you have
ten queries running two sorts each this setting of work_mem could consume 5GB.

Raising shared buffers could improve your pg_dump speed. If all the data is in
cache it would reduce the time spend moving data between filesystem cache and
postgres shared buffers.

What made you raise wal_buffers so high? I don't think it hurts but that's a
few orders of magnitude higher than what I would expect to help.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Dual core Opterons beating quad core Xeons?

2007-12-19 Thread Gregory Stark
"Joshua D. Drake" <[EMAIL PROTECTED]> writes:

> It is pretty much common knowledge that

I think we have too much "common knowledge".

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Dual core Opterons beating quad core Xeons?

2007-12-19 Thread Gregory Stark
"Stuart Bishop" <[EMAIL PROTECTED]> writes:

> For pgbench (PG 8.2 running Ubuntu), the Opteron is getting about 6x TPS
> over the Xeon (3000+ TPS on Opteron vs ~500 on Xeon). Things get a little
> better for Xeon with PG 8.3 (570-540 TPS).

There was a problem in the past which affected Xeons. But I thought it had
been mostly addressed. Xeons are (or were? these things are always changing)
more sensitive to interprocess contention due to their memory architecture
though.

What are you actually doing in these transactions? Are they read-only? If not
is fsync=off (which you don't want if you care about your data but you do if
you're trying to benchmark the cpu).

Are the crappy disks *identical* crappy disks? If they have different
controllers or different drives (or different versions of the OS) then you
might be being deceived by write caching on one set and not the other. If
they're not read-only transactions and fsync=on then the TPS of 3000+ is not
credible and this is likely.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Optimising a query

2007-12-19 Thread Gregory Stark

> Also, try replacing the DISTINCT with GROUP BY. The code path for DISTINCT
> unfortunately needs a bit of cleaning up and isn't exactly equivalent to GROUP
> BY. In particular it doesn't support hash aggregates which, if your work_mem
> is large enough, might work for you here.

Sorry, strike that last suggestion. I was looking at the plan and forgot that
the query had DISTINCT ON. It is possible to replace DISTINCT ON with GROUP BY
but it's not going to be faster than the DISTINCT ON case since you'll need
the sort anyways.

Actually it's possible to do without the sort if you write some fancy
aggregate functions but for this large a query that's going to be awfully
complex.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Optimising a query

2007-12-19 Thread Gregory Stark
"Richard Huxton" <[EMAIL PROTECTED]> writes:

> Paul Lambert wrote:
>
>> "  ->  Sort  (cost=30197.98..30714.85 rows=206748 width=16) (actual >> 
>> time=5949.691..7018.931 rows=206748 loops=1)"
>> "Sort Key: dealer_id, year_id, subledger_id, account_id"
>> "Sort Method:  external merge  Disk: 8880kB"

> Before that though, try issuing a "SET work_mem = '9MB'" before running your
> query. If that doesn't change the plan step up gradually. You should be able 
> to
> get the sort stage to happen in RAM rather than on disk (see "Sort Method"
> above). 

FWIW you'll probably need more than that. Try something more like 20MB.

Also, note you can change this with SET for just this connection and even just
this query and then reset it to the normal value (or use SET LOCAL). You don't
have to change it in the config file and restart the whole server.

Also, try replacing the DISTINCT with GROUP BY. The code path for DISTINCT
unfortunately needs a bit of cleaning up and isn't exactly equivalent to GROUP
BY. In particular it doesn't support hash aggregates which, if your work_mem
is large enough, might work for you here.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] libgcc double-free, backend won't die

2007-12-16 Thread Gregory Stark
"Craig James" <[EMAIL PROTECTED]> writes:

> Gregory Stark wrote:
>
>> And the only reason to do that would be to work around one bug in one small
>> range of glibc versions. If you're going to use a multi-threaded library
>> (which isn't very common since it's hard to do safely for all those other
>> reasons) surely using a version of your OS without any thread related bugs is
>> a better idea.
>
> You're jumping ahead. This problem has not been accurately diagnosed yet. It
> could be that the pthreads issue is completely misleading everyone, and in 
> fact
> there is a genuine memory corruption going on here. Or not. We don't know yet.
> I have made zero progress fixing this problem.

Well, no that would be you jumping ahead then... You proposed Postgres
changing the way it handles threaded libraries based on Tom's suggestion that
your problem was something like the glibc problem previously found. My comment
was based on the known glibc problem. From what you're saying it's far from
certain that the problem would be fixed by changing Postgres's behaviour in
the way you proposed.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] libgcc double-free, backend won't die

2007-12-16 Thread Gregory Stark
"Tom Lane" <[EMAIL PROTECTED]> writes:

> James Mansion <[EMAIL PROTECTED]> writes:
>> Is there any particular reason not to ensure that any low-level 
>> threading support in libc is enabled right
>> from the get-go, as a build-time option?
>
> Yes.
> 1) It's of no value to us
> 2) On many platforms there is a nonzero performance penalty

And the only reason to do that would be to work around one bug in one small
range of glibc versions. If you're going to use a multi-threaded library
(which isn't very common since it's hard to do safely for all those other
reasons) surely using a version of your OS without any thread related bugs is
a better idea.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


  1   2   3   >