Re: [PERFORM] Any better plan for this query?..
Simon Riggs writes: > We already know that MySQL favors nested loop joins >From what I read I thought that was the *only* type of join MySQL supports. The big picture view here is that whether we run a short query in half a millisecond versus two milliseconds is usually not really important. It could matter if you're concerned with how many transactions/s you can run in a busy server -- but that's not exactly the same thing and you should really measure that in that case. It would be nice if we were in the same ballpark as MySQL but we would only be interesting in such optimizations if they don't come at the expense of scalability under more complex workloads. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
"Jignesh K. Shah" writes: > In next couple of weeks I plan to test the patch on a different x64 based > system to do a sanity testing on lower number of cores and also try out other > workloads ... I'm actually more interested in the large number of cores but fewer processes and lower max_connections. If you set max_connections to 64 and eliminate the wait time you should, in theory, be able to get 100% cpu usage. It would be very interesting to track down the contention which is preventing that. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! - Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High CPU Utilization
Greg Smith writes: > On Mon, 16 Mar 2009, Joe Uhl wrote: > >> Here is vmstat 1 30. We are under peak load right now so I can gather >> information from the real deal > > Quite helpful, reformatting a bit and picking an informative section: > > procs ---memory-----swap- io--- -system-- cpu > r b swpd free buff cache si so bibo in cs us sy id wa > 0 34 95048 11025880 56988 15020168 00 3852 160 3616 8614 11 1 6 82 > 3 25 95048 10996356 57044 15044796 00 7892 456 3126 7115 4 3 8 85 > 1 26 95048 10991692 57052 15050100 00 5188 176 2566 5976 3 2 12 83 > > This says that your server is spending all its time waiting for I/O, actual > CPU > utilization is minimal. You're only achieving around 3-8MB/s of random I/O. > That's the reality of what your disk I/O subsystem is capable of, regardless > of > what its sequential performance with dd looks like. If you were to run a more > complicated benchmark like bonnie++ instead, I'd bet that your "seeks/second" > results are very low, even though sequential read/write is fine. > > The Perc5 controllers have a pretty bad reputation for performance on this > list, even in RAID10. Not much you can do about that beyond scrapping the > controller and getting a better one. Hm, well the tests I ran for posix_fadvise were actually on a Perc5 -- though who knows if it was the same under the hood -- and I saw better performance than this. I saw about 4MB/s for a single drive and up to about 35MB/s for 15 drives. However this was using linux md raid-0, not hardware raid. But you shouldn't get your hopes up too much for random i/o. 3-8MB seems low but consider the following: $ units 2445 units, 71 prefixes, 33 nonlinear units You have: 8kB / .5|7200min You want: MB/s * 1.92 / 0.5208 -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres benchmarking with pgbench
Greg Smith writes: > On Mon, 16 Mar 2009, Gregory Stark wrote: > >> Why would checkpoints force out any data? It would dirty those pages and then >> sync the files marking them clean, but they should still live on in the >> filesystem cache. > > The bulk of the buffer churn in pgbench is from the statement that updates a > row in the accounts table. That constantly generates updated data block and > index block pages. If you can keep those changes in RAM for a while before > forcing them to disk, you can get a lot of benefit from write coalescing that > goes away if constant checkpoints push things out with a fsync behind them. > > Not taking advantage of that effectively reduces the size of the OS cache, > because you end up with a lot of space holding pending writes that wouldn't > need to happen at all yet were the checkpoints spaced out better. Ok, so it's purely a question of write i/o, not reduced cache effectiveness. I think I could see that. I would be curious to see these results with a larger checkpoint_segments setting. Looking further at the graphs I think they're broken but not in the way I had guessed. It looks like they're *overstating* the point at which the drop occurs. Looking at the numbers it's clear that under 1GB performs well but at 1.5GBP it's already dropping to the disk-resident speed. I think pgbench is just not that great a model for real-world usage . a) most real world workloads are limited by read traffic, not write traffic, and certainly not random update write traffic; and b) most real-world work loads follow a less uniform distribution so keeping busy records and index regions in memory is more effective. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres benchmarking with pgbench
Greg Smith writes: > On Mon, 16 Mar 2009, m...@bortal.de wrote: > >> Any idea why my performance colapses at 2GB Database size? I don't understand how you get that graph from the data above. The data above seems to show your test databases at 1.4GB and 2.9GB. There are no 1GB and 2GB data points like the graphs show. Presumably the data point at 2G on the graph should really be at 2.9GB? In which case I don't find it surprising at all that performance would start to shift from RAM-resident before that to disk-resident above that. You have 1GB set aside for shared buffers leaving about 3GB for filesystem cache. You could try setting shared buffers smaller, perhaps 512kB or larger, perhaps 3.5GB. To minimize the overlap. I would tend to avoid the latter though. One thing to realize is that pgbench performs a completely flat distribution of data accesses. So every piece of data is equally likely to be used. In real life work-loads there are usually some busier and some less busy sections of the database and the cache tends to keep the hotter data resident even as the data set grows. > In your case, you've got shared_buffers=1GB, but the rest of the RAM is the > server isn't so useful to you because you've got checkpoint_segments set to > the > default of 3. That means your system is continuously doing small checkpoints > (check your database log files, you'll see what I meant), which keeps things > from ever really using much RAM before everything has to get forced to disk. Why would checkpoints force out any data? It would dirty those pages and then sync the files marking them clean, but they should still live on in the filesystem cache. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
"Jignesh K. Shah" writes: > Generally when there is dead constant.. signs of classic bottleneck ;-) We > will be fixing one to get to another.. but knocking bottlenecks is the name of > the game I think Indeed. I think the bottleneck we're interested in addressing here is why you say you weren't able to saturate the 64 threads with 64 processes when they're all RAM-resident. >From what I see you still have 400+ processes? Is that right? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 8.4 Performance improvements: was Re: Proposal of tunable fix for scalability of 8.4
Robert Haas writes: > On Fri, Mar 13, 2009 at 10:06 PM, Tom Lane wrote: > >> I assume you meant effective_io_concurrency. We'd still need a special >> case because the default is currently hard-wired at 1, not 0, if >> configure thinks the function exists. Also there's a posix_fadvise call >> in xlog.c that that parameter doesn't control anyhow. > > I think 1 should mean no prefetching, rather than 0. If the number of > concurrent I/O requests was 0, that would mean you couldn't perform > any I/O at all. That is actually how I had intended it but apparently I messed it up at some point such that later patches were doing some prefetching at 1 and there was no way to disable it. When Tom reviewed it he corrected the inability to disable prefetching by making 0 disable prefetching. I didn't think it was worth raising as an issue but I didn't realize we were currently doing prefetching by default? i didn't realize that. Even on a system with posix_fadvise there's nothing much to be gained unless the data is on a RAID device, so the original objection holds anyways. We shouldn't do any prefetching unless the user tells us to. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 8.4 Performance improvements: was Re: Proposal of tunable fix for scalability of 8.4
Tom Lane writes: > Alan Stange writes: >> Gregory Stark wrote: >>> AFAIK Opensolaris doesn't implement posix_fadvise() so there's no benefit. > >> It's implemented. I'm guessing it's not what you want to see though: >> http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libc/port/gen/posix_fadvise.c > > Ugh. So apparently, we actually need to special-case Solaris to not > believe that posix_fadvise works, or we'll waste cycles uselessly > calling a do-nothing function. Thanks, Sun. Do we? Or do we just document that setting effective_cache_size on Solaris won't help? I'm leaning towards the latter because I expect Sun will implement this and there will be people running 8.4 on newer versions of the OS long after it's out. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 8.4 Performance improvements: was Re: Proposal of tunable fix for scalability of 8.4
A minute ago I said: AFAIK Opensolaris doesn't implement posix_fadvise() so there's no benefit. It would be great to hear if you could catch the ear of the right people to get an implementation committed. Depending on how the i/o scheduler system is written it might not even be hard -- the Linux implementation of WILLNEED is all of 20 lines. I noticed after sending it that that's slightly unfair. The 20-line function calls another function (which calls another function) to do the real readahead work. That function (mm/readahead.c:__do_page_cache_readahead()) is 48 lines. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
"Jignesh K. Shah" writes: > Gregory Stark wrote: >> Keep in mind when you do this that it's not interesting to test a number of >> connections much larger than the number of processors you have. Once the >> system reaches 100% cpu usage it would be a misconfigured connection pooler >> that kept more than that number of connections open. > > Greg, Unfortuately the problem is that.. I am trying to reach 100% CPU which > I cannot and hence I am increasing the user count :-) The effect of increasing the number of users with a connection pooler would be to decrease the 200ms sleep time to 0. This is all assuming the idle time is *between* transactions. If you have idle time in the middle of transactions things become a lot more tricky. I think we are missing something to deal with that use case. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 8.4 Performance improvements: was Re: Proposal of tunable fix for scalability of 8.4
"Jignesh K. Shah" writes: > Can we do a vote on which specific performance features we want to test? > > Many of the improvements may not be visible through this standard tests so > feedback on testing methology for those is also appreciated. > * Visibility map - Reduce Vacuum overhead - (I think I can time vacuum with > some usage on both databases) Timing vacuum is kind of pointless -- the only thing that matters is whether it's "fast enough". But it is worth saying that good benchmarks should include normal vacuum runs. Benchmarks which don't run long enough to trigger vacuum aren't realistic. > * Prefetch IO with posix_fadvice () - Though I am not sure if it is supported > on UNIX or not (but can be tested by standard tests) Well clearly this is my favourite :) AFAIK Opensolaris doesn't implement posix_fadvise() so there's no benefit. It would be great to hear if you could catch the ear of the right people to get an implementation committed. Depending on how the i/o scheduler system is written it might not even be hard -- the Linux implementation of WILLNEED is all of 20 lines. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
"Jignesh K. Shah" writes: > Scott Carey wrote: >> On 3/12/09 11:37 AM, "Jignesh K. Shah" wrote: >> >> In general, I suggest that it is useful to run tests with a few different >> types of pacing. Zero delay pacing will not have realistic number of >> connections, but will expose bottlenecks that are universal, and less >> controversial > > I think I have done that before so I can do that again by running the users at > 0 think time which will represent a "Connection pool" which is highly > utilized" > and test how big the connection pool can be before the throughput tanks.. This > can be useful for App Servers which sets up connections pools of their own > talking with PostgreSQL. A minute ago I said: Keep in mind when you do this that it's not interesting to test a number of connections much larger than the number of processors you have. Once the system reaches 100% cpu usage it would be a misconfigured connection pooler that kept more than that number of connections open. Let me give another reason to call this misconfigured: Postgres connections are heavyweight and it's wasteful to keep them around but idle. This has a lot in common with the issue with non-persistent connections where each connection is used for only a short amount of time. In Postgres each connection requires a process, which limits scalability on a lot of operating systems already. On many operating systems having thousands of processes in itself would create a lot of issues. Each connection then allocates memory locally for things like temporary table buffers, sorting, hash tables, etc. On most operating systems this memory is not freed back to the system when it hasn't been used recently. (Worse, it's more likely to be paged out and have to be paged in from disk even if it contains only garbage we intend to overwrite!). As a result, having thousands of processes --aside from any contention-- would lead to inefficient use of system resources. Consider for example that if your connections are using 1MB each then a thousand of them are using 1GB of RAM. When only 64MB are actually useful at any time. I bet that 64MB would fit entirely in your processor caches you weren't jumping around in the gigabyte of local memory your thousands of processes' have allocated. Consider also that you're limited to setting relatively small settings of work_mem for fear all your connections might happen to start a sort simultaneously. So (in a real system running arbitrary queries) instead of a single quicksort in RAM you'll often be doing unnecessary on-disk merge sorts using unnecessarily small merge heaps while gigabytes of RAM either go wasted to cover a rare occurrence or are being used to hold other sorts which have been started but context-switched away. To engineer a system intended to handle thousands of simultaneous connections you would want each backend to use the most light-weight primitives such as threads, and to hold the least possible state in local memory. That would look like quite a different system. The locking contention is the least of the issues we would want to deal with to get there. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
"Jignesh K. Shah" writes: > Scott Carey wrote: >> On 3/12/09 11:37 AM, "Jignesh K. Shah" wrote: >> >> In general, I suggest that it is useful to run tests with a few different >> types of pacing. Zero delay pacing will not have realistic number of >> connections, but will expose bottlenecks that are universal, and less >> controversial > > I think I have done that before so I can do that again by running the users at > 0 think time which will represent a "Connection pool" which is highly > utilized" > and test how big the connection pool can be before the throughput tanks.. This > can be useful for App Servers which sets up connections pools of their own > talking with PostgreSQL. Keep in mind when you do this that it's not interesting to test a number of connections much larger than the number of processors you have. Once the system reaches 100% cpu usage it would be a misconfigured connection pooler that kept more than that number of connections open. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Proposal of tunable fix for scalability of 8.4
Grzegorz Jaśkiewicz writes: > So please, don't say that this doesn't make sense because he tested it > against ram disc. That was precisely the point of exercise. What people are tip-toeing around saying, which I'll just say right out in the most provocative way, is that Jignesh has simply *misconfigured* the system. He's contrived to artificially create a lot of unnecessary contention. Optimizing the system to reduce the cost of that artificial contention at the expense of a properly configured system would be a bad idea. It's misconfigured because there are more runnable threads than there are cpus. A lot more. 15 times as many as necessary. If users couldn't run connection poolers on their own the right approach for us to address this contention would be to build one into Postgres, not to re-engineer the internals around the misuse. Ram-resident use cases are entirely valid and worth testing, but in those use cases you would want to have about as many processes as you have processes. The use case where having larger number of connections than processors makes sense is when they're blocked on disk i/o (or network i/o or whatever else other than cpu). And having it be configurable doesn't mean that it has no cost. Having a test of a user-settable dynamic variable in the middle of a low-level routine could very well have some cost. Just the extra code would have some cost in reduced cache efficiency. It could be that loop prediction and so on save us but that remains to be proven. And as always the question would be whether the code designed for this misconfigured setup is worth the maintenance effort if it's not helping properly configured setups. Consider for example any work with dtrace to optimize locks under properly configured setups would lead us to make changes which would have to be tested twice, once with and once without this option. What do we do if dtrace says some unrelated change helps systems with this option disabled but hurts systems with it enabled? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL block size for SSD RAID setup?
henk de wit writes: > Hi, > I was reading a benchmark that sets out block sizes against raw IO performance > for a number of different RAID configurations involving high end SSDs (the > Mtron 7535) on a powerful RAID controller (the Areca 1680IX with 4GB RAM). See > http://jdevelopment.nl/hardware/one-dvd-per-second/ You might also be interested in: http://thunk.org/tytso/blog/2009/02/20/aligning-filesystems-to-an-ssds-erase-block-size/ http://thunk.org/tytso/blog/2009/02/22/should-filesystems-be-optimized-for-ssds/ It seems you have to do more work than just look at the application. You want the application, the filesystem, the partition layout, and the raid device geometry to all consistently maintain alignment with erase blocks. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Abnormal performance difference between Postgres and MySQL
Farhan Husain writes: > I can provide any other information needed and also the data if anyone > wants. What did the query plans look like in both databases? In Postgres you can get the query plan with EXPLAIN ANALYZE select ... You can leave out the ANALYZE if you can't wait until the query completes but it will have much less information to diagnosis any problems. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] TCP network cost
"Ross J. Reedstrom" writes: > On Tue, Feb 17, 2009 at 12:20:02AM -0700, Rusty Conover wrote: >> >> Try running tests with ttcp to eliminate any PostgreSQL overhead and >> find out the real bandwidth between the two machines. If its results >> are also slow, you know the problem is TCP related and not PostgreSQL >> related. > > I did in fact run a simple netcat client/server pair and verified that I > can transfer that file on 0.12 sec localhost (or hostname), 0.35 over the > net, so TCP stack and network are not to blame. This is purely inside > the postgresql code issue, I believe. There's not much Postgres can do to mess up TCP/IP. The only things that come to mind are a) making lots of short-lived connections and b) getting caught by Nagle when doing lots of short operations and blocking waiting on results. What libpq (or other interface) operations are you doing exactly? [also, your Mail-Followup-To has a bogus email address in it. Please don't do that] -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query composite index range in an efficient way
Havasvölgyi Ottó writes: > I also tried Row constructors with a Between expression, but in this case > Postgres handled the elements of the row independently, and this led to > false query result. What version of Postgres is this? row constructors were fixed a long time ago to not do that and the main benefit of that was precisely that this type of expression could use a multi-column index effectively. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] suggestions for postgresql setup on Dell 2950 , PERC6i controller
Arjen van der Meijden writes: > When we purchased our Perc 5/e with MD1000 filled with 15 15k rpm sas disks, > my > colleague actually spend some time benchmarking the PERC and a ICP Vortex > (basically a overclocked Adaptec) on those drives. Unfortunately he doesn't > have too many comparable results, but it basically boiled down to quite good > scores for the PERC and a bit less for the ICP Vortex. > IOMeter sequential reads are above 300MB/s for the RAID5 and above 240MB/s for > a RAID10 (and winbench99 versions range from 400+ to 600+MB/s). FWIW those are pretty terrible numbers for fifteen 15k rpm drives. They're about what you would expect if for a PCI-X card which was bus bandwidth limited. A PCI-e card should be able to get about 3x that from the drives. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] suggestions for postgresql setup on Dell 2950 , PERC6i controller
Scott Marlowe writes: > We purhcased the Perc 5E, which dell wanted $728 for last fall with 8 > SATA disks in an MD-1000 and the performance is just terrible. No > matter what we do the best throughput on any RAID setup was about 30 > megs/second write and 60 Megs/second read. Is that sequential or a mix of random and sequential (It's too high to be purely random i/o)? A single consumer drive should be able to beat those numbers on sequential i/o. If it's a mix of random and sequential then performance will obviously depend on the mix. > I can get that from a mirror set of the same drives under linux kernel > software RAID. Why is that surprising? I would expect software raid to be able to handle 8 drives perfectly well assuming you had an controller and bus you aren't saturating. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Deleting millions of rows
Robert Haas writes: > That's good if you're deleting most or all of the parent table, but > what if you're deleting 100,000 values from a 10,000,000 row table? > In that case maybe I'm better off inserting all of the deleted keys > into a side table and doing a merge or hash join between the side > table and the child table... It would be neat if we could feed the queued trigger tests into a plan node like a Materialize and use the planner to determine which type of plan to generate. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Sort performance
Robert Haas writes: > On Thu, Jan 29, 2009 at 3:15 PM, Subbiah Stalin-XCGF84 > wrote: >> >> i see the sort operation spilling to disk writing upto 430MB and then >> return the first 500 rows. Our query is of the sort >> >> Now if set the work_mem to 500MB (i did this in a psql session without >> making it global) and ran the same query. One would think the sort >> operations would happen in memory and not spill to disk but i still see >> 430MB written to disk however, the query complete time dropped down to >> 351Secs. So work_mem did have an impact but wondering why its still writing >> to disk when it can all do it memory. The on-disk storage is more compact than the in-memory storage so you actually need a larger value than the space reported for on-disk storage to avoid the disk sort entirely. The accounting also isn't perfect; the on-disk sort still uses some ram, for example. > What happens if you set work_mem to something REALLY big, like 5GB? Don't set it larger than the available RAM though -- or you'll quite possibly get an out-of-error error. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] NOT IN >2hrs vs EXCEPT < 2 sec.
Kevin Traster writes: > Regarding the previous posts about the same issues of PERFORMENCE between > NOT IN versus EXCEPT. There has not been any answer to explain it - just > talk about the differenences between the two results. > > Yes, I can still get the results using EXCEPT but it would be nice to no why > I can't get NOT IN to complete the simple query. There are two answers here. One you've already been given, that NOT IN has to handle NULLs specially and that makes these plans not equivalent. The NOT IN is decidedly harder to solve. The other answer is that EXCEPT is a set operation which in Postgres uses a completely different set of logic. Even if you used NOT EXISTS which really is equivalent to EXCEPT the resulting plans would be different. Which one would be better would depend on the circumstances. In an ideal world every equivalent query would generate identical plans. We don't live in an ideal world and Postgres isn't perfect. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SSD performance
da...@lang.hm writes: > they currently have it do a backup immediatly on power loss (which is a safe > choice as the contents won't be changing without power), but it then powers > off > (which is not good for startup time afterwords) So if you have a situation where it's power cycling rapidly each iteration drains the battery of the time it takes to save the state but only charges it for the time the power is on. I wonder how many iterations that gives you. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] slow query
"Scott Marlowe" writes: > -> Index Scan using users_orgid_idx on users u (cost=0.00..129.52 > rows=5 width=271) (actual time=843.825..860.638 rows=0 loops=35) >Index Cond: (u.orgid = j2.orgid) >Filter: ((u.type_id < 10) AND (u.deleted = 0) AND > ((lower((u.lname)::text) ~~ 'boat%'::text) OR (lower((u.fname)::text) ~~ > 'boat%'::text) OR (lower((u.username)::text) ~~ 'boat%'::text) OR > (lower(u.option1) ~~ 'boat%'::text) OR (lower((u.email)::text) ~~ > '%boat%'::text) OR (lower(u.external_id) = 'boat'::text))) Not sure if this is what's going on but I find the high startup time for this index scan suspicious. Either there are a lot of dead tuples (which would explain the second run being fast if it marks them all as lp_dead) or there are a lot of matching index pointers which fail those other constraints. Assuming it's the latter perhaps some other index definition would let it zero in on the right tuples more quickly instead of having to grovel through a lot of irrelevant rows? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] understanding postgres issues/bottlenecks
Ron writes: > At 10:36 AM 1/10/2009, Gregory Stark wrote: >> >> Or a system crash. If the kernel panics for any reason when it has dirty >> buffers in memory the database will need to be restored. > > A power conditioning UPS should prevent a building wide or circuit level bad > power event Except of course those caused *by* a faulty UPS. Or for that matter by the power supply in the computer or drive array, or someone just accidentally hitting the wrong power button. I'm surprised people are so confident in their kernels though. I know some computers with uptimes measured in years but I know far more which don't. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] understanding postgres issues/bottlenecks
da...@lang.hm writes: > On Sat, 10 Jan 2009, Markus Wanner wrote: > >> My understanding of SSDs so far is, that they are not that bad at >> writing *on average*, but to perform wear-leveling, they sometimes have >> to shuffle around multiple blocks at once. So there are pretty awful >> spikes for writing latency (IIRC more than 100ms has been measured on >> cheaper disks). That would be fascinating. And frightening. A lot of people have been recommending these for WAL disks and this would be make them actually *worse* than regular drives. > well, I have one of those cheap disks. > > brand new out of the box, format the 32G drive, then copy large files to it > (~1G per file). this should do almost no wear-leveling, but it's write > performance is still poor and it has occasional 1 second pauses. This isn't similar to the way WAL behaves though. What you're testing is the behaviour when the bandwidth to the SSD is saturated. At that point some point in the stack, whether in the SSD, the USB hardware or driver, or OS buffer cache can start to queue up writes. The stalls you see could be the behaviour when that queue fills up and it needs to push back to higher layers. To simulate WAL you want to transfer smaller volumes of data, well below the bandwidth limit of the drive, fsync the data, then pause a bit repeat. Time each fsync and see whether the time they take is proportional to the amount of data written in the meantime or whether they randomly spike upwards. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] understanding postgres issues/bottlenecks
"Scott Marlowe" writes: > On Sat, Jan 10, 2009 at 5:40 AM, Ron wrote: >> At 03:28 PM 1/8/2009, Merlin Moncure wrote: >>> just be aware of the danger . hard reset (power off) class of failure >>> when fsync = off means you are loading from backups. >> >> That's what redundant power conditioning UPS's are supposed to help prevent >> ;-) > > But of course, they can't prevent them, but only reduce the likelihood > of their occurrance. Everyone who's working in large hosting > environments has at least one horror story to tell about a power > outage that never should have happened. Or a system crash. If the kernel panics for any reason when it has dirty buffers in memory the database will need to be restored. >> ...and of course, those lucky few with bigger budgets can use SSD's and not >> care what fsync is set to. > > Would that prevent any corruption if the writes got out of order > because of lack of fsync? Or partial writes? Or wouldn't fsync still > need to be turned on to keep the data safe. I think the idea is that with SSDs or a RAID with a battery backed cache you can leave fsync on and not have any significant performance hit since the seek times are very fast for SSD. They have limited bandwidth but bandwidth to the WAL is rarely an issue -- just latency. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Poor plan choice in prepared statement
da...@lang.hm writes: > since there is not a pre-parsed interface for queries, it may make sense to > setup a way to have the query pre-parsed, but not pre-planned for cases like > this. What would be more interesting would be to have plans that take into account the outlier values and have alternative plans for those values. One aspect that hasn't been discussed here is whether it's only certain outlier arguments that cause Postgres to choose the poor plan for you or whether it always chooses it for all the sets of arguments you actually use. If it's the former then it's possible you're only being bitten if the first time you prepare the query happens to have one of these outlier set of parameters. I forget what version this went in but I think it was before 8.2 that Postgres started using the first set of arguments seen to plan the query. This is usually an improvement over just guessing but if that first set is unusual it could lead to strange results. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] dbt-2 tuning results with postgresql-8.3.5
"Mark Wong" writes: > Thanks for the input. In a more constructive vein: 1) autovacuum doesn't seem to be properly tracked. It looks like you're just tracking the autovacuum process and not the actual vacuum subprocesses which it spawns. 2) The response time graphs would be more informative if you excluded the ramp-up portion of the test. As it is there are big spikes at the low end but it's not clear whether they're really part of the curve or due to ramp-up. This is especially visible in the stock-level graph where it throws off the whole y scale. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] dbt-2 tuning results with postgresql-8.3.5
"Mark Wong" writes: >> I'm not sure how bad that is for the benchmarks. The only effect that comes >> to >> mind is that it might exaggerate the effects of some i/o intensive operations >> that under normal conditions might not cause any noticeable impact like wal >> log file switches or even checkpoints. > > I'm not sure I'm following. All I'm saying is that the performance characteristics won't be the same when the service wait times are 1-10 seconds rather than the 20-30ms at which alarm bells would start to ring on a real production system. I'm not exactly sure what changes it might make though. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] dbt-2 tuning results with postgresql-8.3.5
Mark Wong writes: > On Dec 20, 2008, at 5:33 PM, Gregory Stark wrote: > >> "Mark Wong" writes: >> >>> To recap, dbt2 is a fair-use derivative of the TPC-C benchmark. We >>> are using a 1000 warehouse database, which amounts to about 100GB of >>> raw text data. >> >> Really? Do you get conforming results with 1,000 warehouses? What's the 95th >> percentile response time? > > No, the results are not conforming. You and others have pointed that out > already. The 95th percentile response time are calculated on each page of > the > previous links. Where exactly? Maybe I'm blind but I don't see them. > > I find your questions a little odd for the input I'm asking for. Are you > under the impression we are trying to publish benchmarking results? Perhaps > this is a simple misunderstanding? Hm, perhaps. The "conventional" way to run TPC-C is to run it with larger and larger scale factors until you find out the largest scale factor you can get a conformant result at. In other words the scale factor is an output, not an input variable. You're using TPC-C just as an example workload and looking to see how to maximize the TPM for a given scale factor. I guess there's nothing wrong with that as long as everyone realizes it's not a TPC-C benchmark. Except that if the 95th percentile response times are well above a second I have to wonder whether the situation reflects an actual production OLTP system well. It implies there are so many concurrent sessions that any given query is being context switched out for seconds at a time. I have to imagine that a real production system would consider the system overloaded as soon as queries start taking significantly longer than they take on an unloaded system. People monitor the service wait times and queue depths for i/o systems closely and having several seconds of wait time is a highly abnormal situation. I'm not sure how bad that is for the benchmarks. The only effect that comes to mind is that it might exaggerate the effects of some i/o intensive operations that under normal conditions might not cause any noticeable impact like wal log file switches or even checkpoints. If you have a good i/o controller it might confuse your results a bit when you're comparing random and sequential i/o because the controller might be able to sort requests by physical position better than in a typical oltp environment where the wait queues are too short to effectively do that. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] dbt-2 tuning results with postgresql-8.3.5
"Mark Wong" writes: > To recap, dbt2 is a fair-use derivative of the TPC-C benchmark. We > are using a 1000 warehouse database, which amounts to about 100GB of > raw text data. Really? Do you get conforming results with 1,000 warehouses? What's the 95th percentile response time? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help with 8.4 Performance Testing
Matthew Wakeling <[EMAIL PROTECTED]> writes: > On Tue, 9 Dec 2008, Scott Marlowe wrote: >> I wonder how many hard drives it would take to be CPU bound on random >> access patterns? About 40 to 60? And probably 15k / SAS drives to >> boot. Cause that's what we're looking at in the next few years where >> I work. > > There's a problem with that thinking. That is, in order to exercise many > spindles, you will need to have just as many (if not more) concurrent > requests. > And if you have many concurrent requests, then you can spread them over > multiple CPUs. So it's more a case of "How many hard drives PER CPU". It also > becomes a matter of whether Postgres can scale that well. Well: $ units 2445 units, 71 prefixes, 33 nonlinear units You have: 8192 byte/5ms You want: MB/s * 1.6384 / 0.61035156 At 1.6MB/s per drive if find Postgres is cpu-bound doing sequential scans at 1GB/s you'll need about 640 drives to keep one cpu satisfied doing random I/O -- assuming you have perfect read-ahead and the read-ahead itself doesn't add cpu overhead. Both of which are false of course, but at least in theory that's what it'll take. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help with 8.4 Performance Testing
Scott Carey <[EMAIL PROTECTED]> writes: > And as far as I can tell, even after the 8.4 fadvise patch, all I/O is in > block_size chunks. (hopefully I am wrong) >... > In addition to the fadvise patch, postgres needs to merge adjacent I/O's > into larger ones to reduce the overhead. It only really needs to merge up to > sizes of about 128k or 256k, and gain a 8x to 16x drop in syscall overhead, > and additionally potentially save code trips down the shared buffer > management code paths. At lest, thats my guess I haven't looked at any code > and could be wrong. There are a lot of assumptions here that I would be interested in seeing experiments to back up. FWIW when I was doing testing of posix_fadvise I did a *lot* of experiments though only with a couple systems. One had a 3-drive array and one with a 15-drive array, both running Linux. I sometimes could speed up the sequential scan by about 10% but not consistently. It was never more than about 15% shy of the highest throughput from dd. And incidentally the throughput from dd didn't seem to depend much at all on the blocksize. On your system does "dd bs=8k" and "dd bs=128k" really have an 8x performance difference? In short, at least from the evidence available, this all seems like it might be holdover beliefs from the olden days of sysadmining where syscalls were much slower and OS filesystem caches much dumber. I'm still interested in looking into it but I'll have to see actual vmstat or iostat output while it's happening, preferably some oprofile results too. And how many drives do you actually need to get into this situation. Also, what is the output of "vacuum verbose" on the table? > Additionally, the "If your operating system has any reasonable caching > itself" comment earlier in this conversation --- Linux (2.6.18, Centos 5.2) > does NOT. I can easily make it spend 100% CPU in system time trying to > figure out what to do with the system cache for an hour. Just do large > seqscans with memory pressure from work_mem or other forces that the OS will > not deem 'idle'. Once the requested memory is ~75% of the system total, it > will freak out. Linux simply will not give up that last 25% or so of the RAM > for anything but page cache This seems like just a misconfigured system. Linux and most Unixen definitely expect to have a substantial portion of RAM dedicated to disk cache. Keep in mind all your executable pages count towards this page cache too. You can adjust this to some extent with the "swappiness" variable in Linux -- but I doubt you'll be happy with the results regardless. > The other way around (small shared_buffers, let the OS do it) hurts > performance overall quite a bit -- randomly accessed pages get pushed out to > the OS cache more often, and the OS tosses thouse out when a big seqscan > occurs, resulting in a lot more random access from disk and more disk bound > periods of time. Great wonder, this operating system caching, eh? How do you observe this? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help with 8.4 Performance Testing
Greg Smith <[EMAIL PROTECTED]> writes: > On Mon, 8 Dec 2008, Merlin Moncure wrote: > >> I wonder if shared_buffers has any effect on how far you can go before >> you hit the 'tipping point'. > > If your operating system has any reasonable caching itself, not so much at > first. As long as the index on the account table fits in shared_buffers, even > the basic sort of caching logic an OS uses is perfectly functional for > swapping > the individual pages of the account table in and out, the main limiting factor > on pgbench performance. I would expect higher shared_buffers to raise the curve before the first breakpoint but after the first breakpoint make the drop steeper and deeper. The equilibrium where the curve becomes flatter should be lower. That is, as long as the database fits entirely in RAM having more of the buffers be immediately in shared buffers is better. Once there's contention for the precious cache stealing some of it for duplicated buffers will only hurt. > There is a further out tipping point I've theorized about but not really > explored: the point where even the database indexes stop fitting in memory > usefully. As you get closer to that, I'd expect that the clock sweep > algorithm > used by shared_buffers should make it a bit more likely that those important > blocks would hang around usefully if you put them there, rather than giving > most of the memory to the OS to manage. Hm, sounds logical. At that point the slow drop-off should become even shallower and possibly become completely flat. Greater shared_buffers might start helping again at that point. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Partition table query performance
"Greg Jaman" <[EMAIL PROTECTED]> writes: > I have a problem with partitioning and I'm wondering if anyone can provide > some insight. I'm trying to find the max value of a column across multiple > partitions. The query against the partition set is quite slow while queries > against child partitions is very fast! I'm afraid this is a known problematic use case of Postgres's current partitioning support. Postgres is not capable of finding the plan which you're undoubtedly looking for where it uses the same plan as your child table query iterating over the partitions. There are several groups working to improve this in different ways but none of them appear to be on track to be in 8.4 so it will be 8.5 or later before they appear. Sorry. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] seq scan over 3.3 million rows instead of single key index access
"Andrus" <[EMAIL PROTECTED]> writes: > There are indexes on rid(dokumnr) and dok(dokumnr) and dokumnr is int. > Instead of using single key index, 8.1.4 scans over whole rid table. > Sometimes idtelluued can contain more than single row so replacing join with > equality is not possible. > > How to fix ? Firstly the current 8.1 release is 8.1.15. Any of the bugs fixed in those 11 releases might be related to this. Secondly: > CREATE TEMP TABLE idtellUued(dokumnr INT) ON COMMIT DROP; > INSERT INTO idtellUued VALUES(1249228); > explain analyze select 1 > from dok JOIN rid USING(dokumnr) > JOIN idtellUued USING(dokumnr) > > " -> Seq Scan on idtelluued (cost=0.00..31.40 rows=2140 > width=4) > (actual time=0.006..0.011 rows=1 loops=1)" The planner thinks there are 2,140 rows in that temporary table so I don't believe this is from the example posted. I would suggest running ANALYZE idtellUued at some point before the problematic query. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query planner cost estimate less than the sum of its parts?
"Scott Carey" <[EMAIL PROTECTED]> writes: > Certainly, a cost estimate that is ... LESS than one of the sub sections of > the query is wrong. This was one hell of a broken query, but it at least > should have taken an approach that was not a nested loop, and I'm curious if > that choice was due to a bad estimate here. > > Nested Loop IN Join (cost=0.00..3850831.86 rows=128266 width=8) Because it's an IN join it doesn't have to run the inner join to completion. Once it finds a match it can return the outer tuple and continue to the next outer tuple. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Index usage problem on 8.3.3
Tom Lane <[EMAIL PROTECTED]> writes: > Jeff Frost <[EMAIL PROTECTED]> writes: >> Tom Lane wrote: >>> Huh. That does sound like it's a version-to-version difference. >>> There's nothing in the CVS log that seems related though. Are you >>> willing to post your test case? >>> >> It's a customer DB, so I'll contact them and see if we can boil it down >> to a test case with no sensitive data. > > Well, if there was a change it seems to have been in the right direction > ;-) so this is mostly just idle curiosity. Don't jump through hoops to > get a test case. Assuming it's not a bug... -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SAN and full_page_writes
"Nikolas Everett" <[EMAIL PROTECTED]> writes: > Sounds kind of scary. I think postgres forces the underlying OS and file > system to do that stuff (sans the mutli-host magic) using fsync. Is that > right? Yes, so you have to make sure that your filesystem really does support fsync properly. I think most NFS implementations do that. I was more concerned with: Network Appliance supports a number of NFS client implementations for use with databases. These clients provide write atomicity to at least 4 KB, and support synchronous writes when requested by the database. Typically, atomicity is guaranteed only to one virtual memory page, which may be as small as 4 KB. However, if the NFS client supports a direct I/O mode that completely bypasses the cache, then atomicity is guaranteed to the size specified by the “wsize” mount option, typically 32 KB. The failure of some NFS clients to assure write atomicity to a full database block means that the soft atomicity requirement is not always met. Some failures of the host system may result in a fractured database block on disk. In practice such failures are rare. When they happen no data is lost, but media recovery of the affected database block may be required That "media recovery" it's referring to sounds like precisely our WAL full page writes... -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SAN and full_page_writes
"Nikolas Everett" <[EMAIL PROTECTED]> writes: > Thanks for pointing that out Bruce. > > NetApp has a 6 page PDF about NetApp and databases. On page 4: Skimming through this I think all 6 pages are critical. The sentence you quote out of context pertains specifically to the NAS internal organization. The previous pages discuss limitations of OSes, filesystems and especially NFS clients which you may have to be concerned with as well. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] limit clause breaks query planner?
Tom Lane <[EMAIL PROTECTED]> writes: > Guillaume Cottenceau <[EMAIL PROTECTED]> writes: >> It seems to me that if the correlation is 0.99[1], and you're >> looking for less than 1% of rows, the expected rows may be at the >> beginning or at the end of the heap? > > Right, but if you know the value being searched for, you could then > estimate where it is in the table by consulting the histogram. > > Actually, an even easier hack (which would have the nice property of not > needing to know the exact value being searched for), would simply use > the existing cost estimates if the WHERE variables have low correlation > (meaning the random-locations assumption is probably good), but apply > some sort of penalty factor if the correlation is high. Fwiw this will have all the same problems our existing uses of the correlation have. That doesn't mean we shouldn't do it but I would expect it to be improved along with the other uses when we find a better metric. I did happen to speak to a statistician the other day and was given some terms to google. I'll investigate and see if I get anything useful. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] limit clause breaks query planner?
>>>> "Limit (cost=0.00..3.68 rows=15 width=128) (actual >>>> time=85837.043..85896.140 rows=15 loops=1)" >>>> " -> Seq Scan on my_table this_ (cost=0.00..258789.88 rows=1055580 >>>> width=128) (actual time=85837.038..85896.091 rows=15 loops=1)" >>>> "Filter: ((A IS NULL) AND ((B)::text = '21'::text))" >>>> "Total runtime: 85896.214 ms" Postgres does collect and use statistics about what fraction of the "A" column is null. It also collects and uses statistics about what fraction of the "B" column is 21 (using a histogram). And it does take the LIMIT into account. I think the other poster might well be right about this table being extremely bloated. You could test by running and posting the results of: VACUUM VERBOSE my_table What it doesn't collect is where in the table those records are -- so if there are a lot of them then it might use a sequential scan regardless of whether they're at the beginning or end of the table. That seems unlikely to be the problem though. The other thing it doesn't collect is how many of the B=21 records have null As. So if a large percentage of the table has A as null then it will assume that's true for the B=21 records and if there are a lot of B=21 records then it will assume a sequential scan will find matches quickly. If in fact the two columns are highly correlated and B=21 records almost never have A null whereas records with other values of B have lots of null values then Postgres might make a bad decision here. Also, it only has the statitics for B=21 via a histogram. If the distribution of B is highly skewed so that, for example values between 20 and 25 are very common but B=21 happens to be quite rare then Postgres might get a bad estimate here. You could improve this by raising the statistics target for the B column and re-analyzing. That brings up another question -- when was the last time this table was analyzed? What estimates and actual results does postgres get for simple queries like: EXPLAIN ANALYZE SELECT count(*) FROM my_table WHERE A IS NULL; EXPLAIN ANALYZE SELECT count(*) FROM my_table WHERE B=21; EXPLAIN ANALYZE SELECT count(*) FROM my_table WHERE A IS NULL AND B=21; -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] slow update of index during insert/copy
"Scott Carey" <[EMAIL PROTECTED]> writes: > On Raid Controllers and Dev machines: > > For a dev machine the battery backup is NOT needed. > > Battery back up makes a _production_ system faster: In production, data > integrity is everything, and write-back caching is dangerous without a > battery back up. > > So: > Without BBU: Write-through cache = data safe in power failure; Write back > cache = not safe in power failure. > With BBU : Both modes are safe on power loss. This could be read the wrong way. With a BBU it's not that you can run the drives in write-back mode safely. It's that you can cache in the BBU safely. The drives still need to have their write caches off (ie, in write-through mode). -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Big delete on big table... now what?
"Bill Moran" <[EMAIL PROTECTED]> writes: > "Fernando Hevia" <[EMAIL PROTECTED]> wrote: >> >> Hi list. >> >> I have a table with over 30 million rows. Performance was dropping steadily >> so I moved old data not needed online to an historic table. Now the table >> has about 14 million rows. I don't need the disk space returned to the OS >> but I do need to improve performance. Will a plain vacuum do or is a vacuum >> full necessary? >> ¿Would a vacuum full improve performance at all? > > If you can afford the downtime on that table, cluster would be best. > > If not, do the normal vacuum and analyze. This is unlikely to improve > the performance much (although it may shrink the table _some_) but > regular vacuum will keep performance from getting any worse. Note that CLUSTER requires enough space to store the new and the old copies of the table simultaneously. That's the main reason for VACUUM FULL to still exist. There is also the option of doing something like (assuming id is already an integer -- ie this doesn't actually change the data): ALTER TABLE x ALTER id TYPE integer USING id; which will rewrite the whole table. This is effectively the same as CLUSTER except it doesn't order the table according to an index. It will still require enough space to hold two copies of the table but it will be significantly faster. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres not using array
André Volpato <[EMAIL PROTECTED]> writes: > Tom Lane escreveu: >>> We are guessing that a dual core 3.0GHz will beat up a quad core 2.2, >>> at least in this environmnent with less than 4 concurrent queryes. >> >> The most you could hope for from that is less than a 50% speedup. I'd >> suggest investing some tuning effort first. Some rethinking of your >> schema, for example, might buy you orders of magnitude ... with no new >> hardware investment. > > I think we almost reached the tuning limit, without changing the schema. It's hard to tell from the plan you posted (and with only a brief look) but it looks to me like your query with that function is basically doing a join but because the inner side of the join is in your function's index lookup it's effectively forcing the use of a "nested loop" join. That's usually a good choice for small queries against big tables but if you're joining a lot of data there are other join types which are much faster. You might find the planner can do a better job if you write your query as a plain SQL query and let the optimizer figure out the best way instead of forcing its hand. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server
"Gregory Stark" <[EMAIL PROTECTED]> writes: > <[EMAIL PROTECTED]> writes: > >>> If you are completely over-writing an entire stripe, there's no reason to >>> read the existing data; you would just calculate the parity information from >>> the new data. Any good controller should take that approach. >> >> in theory yes, in practice the OS writes usually aren't that large and >> aligned, >> and as a result most raid controllers (and software) don't have the >> special-case code to deal with it. > > I'm pretty sure all half-decent controllers and software do actually. This is > one major reason that large (hopefully battery backed) caches help RAID-5 > disproportionately. The larger the cache the more likely it'll be able to wait > until the entire raid stripe is replaced avoid having to read in the old > parity. Or now that I think about it, replace two or more blocks from the same set of parity bits. It only has to recalculate the parity bits once for all those blocks instead of for every single block write. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server
<[EMAIL PROTECTED]> writes: >> If you are completely over-writing an entire stripe, there's no reason to >> read the existing data; you would just calculate the parity information from >> the new data. Any good controller should take that approach. > > in theory yes, in practice the OS writes usually aren't that large and > aligned, > and as a result most raid controllers (and software) don't have the > special-case code to deal with it. I'm pretty sure all half-decent controllers and software do actually. This is one major reason that large (hopefully battery backed) caches help RAID-5 disproportionately. The larger the cache the more likely it'll be able to wait until the entire raid stripe is replaced avoid having to read in the old parity. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Incorrect estimates on correlated filters
"Craig Ringer" <[EMAIL PROTECTED]> writes: > It strikes me that there are really two types of query hint possible here. > > One tells the planner (eg) "prefer a merge join here". > > The other gives the planner more information that it might not otherwise > have to work with, so it can improve its decisions. "The values used in > this join condition are highly correlated". This sounds familiar: http://article.gmane.org/gmane.comp.db.postgresql.devel.general/55730/match=hints Plus ça change... -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Using PK value as a String
"Mark Mielke" <[EMAIL PROTECTED]> writes: >- Increased keyspace. Even if keyspace allocation is performed, an int4 > only > has 32-bit of keyspace to allocate. The IPv4 address space is already over 85% > allocated as an example of how this can happen. 128-bits has a LOT more > keyspace than 32-bits or 64-bits. The rest of your points are valid (though not particularly convincing to me for most applications) but this example is bogus. The IPv4 address space is congested because of the hierarchic nature of allocations. Not because there is an actual shortage of IPv4 addresses themselves. There would be enough IPv4 for every ethernet device on the planet for decades to come if we could allocate them individually -- but we can't. That is, when allocating an organization 100 addresses if they want to be able to treat them as a contiguous network they must be allocated 128 addresses. And if they might ever grow to 129 they're better off just justifying 256 addresses today. That's not an issue for a sequence generated primary key. Arguably it *is* a problem for UUID which partitions up that 128-bits much the way the original pre-CIDR IPv4 addressing scheme partitioned up the address. But 128-bits is so much bigger it avoids running into the issue. The flip side is that sequence generated keys have to deal with gaps if record is deleted later. So the relevant question is not whether you plan to have 2 billion users at any single point in the future but rather whether you plan to ever have had 2 billion users total over your history. I suspect large networks like Yahoo or Google might be nearing or past that point now even though they probably only have a few hundred million current users. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Using PK value as a String
"Mario Weilguni" <[EMAIL PROTECTED]> writes: > UUID is already a surrogate key not a natural key, in no aspect better than a > numeric key, just taking a lot more space. > > So why not use int4/int8? The main reason to use UUID instead of sequences is if you want to be able to generate unique values across multiple systems. So, for example, if you want to be able to send these userids to another system which is taking registrations from lots of places. Of course that only works if that other system is already using UUIDs and you're all using good generators. You only need int8 if you might someday have more than 2 *billion* users... Probably not an urgent issue. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Using PK value as a String
"Jay" <[EMAIL PROTECTED]> writes: > I have a table named table_Users: > > CREATE TABLE table_Users ( >UserID character(40) NOT NULL default '', >Username varchar(256) NOT NULL default '', >Email varchar(256) NOT NULL default '' >etc... > ); > > The UserID is a character(40) and is generated using UUID function. We > started making making other tables and ended up not really using > UserID, but instead using Username as the unique identifier for the > other tables. Now, we pass and insert the Username to for discussions, > wikis, etc, for all the modules we have developed. I was wondering if > it would be a performance improvement to use the 40 Character UserID > instead of Username when querying the other tables, or if we should > change the UserID to a serial value and use that to query the other > tables. Or just keep the way things are because it doesn't really make > much a difference. Username would not be any slower than UserID unless you have a lot of usernames longer than 40 characters. However making UserID an integer would be quite a bit more efficient. It would take 4 bytes instead of as the length of the Username which adds up when it's in all your other tables... Also internationalized text collations are quite a bit more expensive than a simple integer comparison. But the real question here is what's the better design. If you use Username you'll be cursing if you ever want to provide a facility to allow people to change their usernames. You may not want such a facility now but one day... -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] query planner not using the correct index
"Joshua Shanks" <[EMAIL PROTECTED]> writes: > Those 3 values in reality and in the stats account for 98% of the > rows. actual distinct values are around 350 Measuring n_distinct from a sample is inherently difficult and unreliable. When 98% of your table falls into those categories it's leaving very few chances for the sample to find many other distinct values. I haven't seen the whole thread, if you haven't tried already you could try raising the statistics target for these columns -- that's usually necessary anyways when you have a very skewed distribution like this. > It seems like the planner would want to get the result set from > bars.bars_id condition and if it is big using the index on the join to > avoid the separate sorting, but if it is small (0-5 rows which is our > normal case) use the primary key index to join and then just quickly > sort. Is there any reason the planner doesn't do this? Yeah, Heikki's suggested having a kind of "branch" plan node that knows how where the break-point is between two plans and can call the appropriate one. We don't have anything like that yet. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] switchover between index and sequential scans
"Abhijit Menon-Sen" <[EMAIL PROTECTED]> writes: > -> Index Scan using header_fields_message_key on header_fields > (cost=0.00..1126.73 rows=325 width=4) (actual time=9.003..12.330 rows=17 > loops=75) >Index Cond: (header_fields.message = "outer".message) > > -> Seq Scan on header_fields (cost=0.00..85706.78 rows=1811778 > width=4) (actual time=22.505..29281.553 rows=1812184 loops=1) It looks to me like it's overestimating the number of rows in the index scan by 20x and it's overestimating the cost of random accesses by about 100%. Combined it's overestimating the cost of the index scan by about 40x. > This machine has only 512MB of RAM, and is running FreeBSD 5.4. It has > shared_buffers=3072, effective_cache_size=25000, work_mem=sort_mem=2048. > Changing the last two doesn't seem to have any effect on the plan. You could try dramatically increasing effective_cache_size to try to convince it that most of the random accesses are cached. Or you could reach for the bigger hammer and reduce random_page_cost by about half. Also, if this box is dedicated you could make use of more than 24M for shared buffers. Probably something in the region 64M-128M if your database is large enough to warrant it. And increase the statistics target on header_fields and re-analyze? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Subquery WHERE IN or WHERE EXISTS faster?
"Ulrich" <[EMAIL PROTECTED]> writes: > Hi, > Yes that looks strange. But it is not possible that I have processors in > "users_processors" which do not appear in "processors", because > "users_processors" contains foreign keys to "processors". > > If I remove the LIMIT 10 OFFSET 1 the line "Sort (cost= rows=11.." > disappears and the query return 13 correct processors from "processors". Oh, er, my bad. That makes perfect sense. The "actual" numbers can be affected by what records are actually requested. The LIMIT prevents the records beyond 11 from ever being requested even though they exist. While the bitmap heap scan has to fetch all the records even though they don't all get used, the nested loop only fetches the records as requested. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Subquery WHERE IN or WHERE EXISTS faster?
"Ulrich" <[EMAIL PROTECTED]> writes: > EXPLAIN ANALYZE SELECT speed FROM processors WHERE id IN (SELECT processorid > FROM users_processors WHERE userid=4040) ORDER BY speed ASC LIMIT 10 OFFSET 1; > > Limit (cost=113.73..113.75 rows=7 width=5) (actual time=0.335..0.340 rows=10 > loops=1) > -> Sort (cost=113.73..113.75 rows=8 width=5) (actual time=0.332..0.333 > rows=11 loops=1) ^^ > Sort Key: processors.speed > Sort Method: quicksort Memory: 17kB > -> Nested Loop (cost=47.22..113.61 rows=8 width=5) (actual > time=0.171..0.271 rows=13 loops=1) > -> HashAggregate (cost=47.22..47.30 rows=8 width=4) (actual > time=0.148..0.154 rows=13 loops=1) > -> Bitmap Heap Scan on users_processors > (cost=4.36..47.19 rows=12 width=4) (actual time=0.074..0.117 rows=13 loops=1) ^^ > Index Cond: (userid = 4040) > -> Index Scan using processors_pkey on processors > (cost=0.00..8.28 rows=1 width=9) (actual time=0.006..0.007 rows=1 loops=13) > Index Cond: (processors.id = users_processors.processorid) It looks to me like you have some processors which appear in "users_processors" but not in "processors". I don't know your data model but that sounds like broken referential integrity to me. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] getting estimated cost to agree with actual
"Scott Marlowe" <[EMAIL PROTECTED]> writes: > On Mon, Jun 2, 2008 at 3:43 PM, Justin <[EMAIL PROTECTED]> wrote: >> >> I have noticed that estimated Cost to do the query is way off from Actual. > > Well, they're not measured in the same units. estimated costs are in > terms of the cost to sequentially scan a single tuple, while actual > costs are in milliseconds. s/tuple/page/ -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 2GB or not 2GB
"Josh Berkus" <[EMAIL PROTECTED]> writes: > Simon, > >> There is an optimum for each specific sort. > > Well, if the optimum is something other than "as much as we can get", then we > still have a pretty serious issue with work_mem, no? With the sort algorithm. The problem is that the database can't predict the future and doesn't know how many more records will be arriving and how out of order they will be. What appears to be happening is that if you give the tape sort a large amount of memory it keeps a large heap filling that memory. If that large heap doesn't actually save any passes and doesn't reduce the number of output tapes then it's just wasted cpu time to maintain such a large heap. If you have any clever ideas on how to auto-size the heap based on how many output tapes it will create or avoid then by all means speak up. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] ProcArrayLock (The Saga continues)
"Jignesh K. Shah" <[EMAIL PROTECTED]> writes: > Note with this no think time concept, each clients can be about 75% CPU busy > from what I observed. running it I found the clients scaling up saturates at > about 60 now (compared to 500 from the original test). The peak throughput > was > at about 50 users (using synchrnous_commit=off) So to get the maximum throughput on the benchmark with think times you want to aggregate the clients about 10:1 with a connection pooler or some middleware layer of some kind, it seems. It's still interesting to find the choke points for large numbers of connections. But I think not because it's limiting your benchmark results -- that would be better addressed by using fewer connections -- just for the sake of knowing where problems loom on the horizon. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 2GB or not 2GB
"Josh Berkus" <[EMAIL PROTECTED]> writes: > sort_mem: My tests with 8.2 and DBT3 seemed to show that, due to > limitations of our tape sort algorithm, allocating over 2GB for a single > sort had no benefit. However, Magnus and others have claimed otherwise. > Has this improved in 8.3? Simon previously pointed out that we have some problems in our tape sort algorithm with large values of work_mem. If the tape is "large enough" to generate some number of output tapes then increasing the heap size doesn't buy us any reduction in the future passes. And managing very large heaps is a fairly large amount of cpu time itself. The problem of course is that we never know if it's "large enough". We talked at one point about having a heuristic where we start the heap relatively small and double it (adding one row) whenever we find we're starting a new tape. Not sure how that would work out though. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] "Big O" notation for postgres?
"Richard Huxton" <[EMAIL PROTECTED]> writes: > Jonah H. Harris wrote: >> On Wed, May 21, 2008 at 10:10 AM, H. Hall <[EMAIL PROTECTED]> wrote: >>> Does anyone know if there is a source that provides "Big O" notation for >>> postgres's aggregate functions and operations? For example is count(*) = >>> O(1) or O(n)? >> >> I don't know of any document containing the complexity of each >> aggregate, but it's sometimes left as a comment in the souce code. > > Recent max() and min() can be O(n) or O(1) depending on the where-clause and > presence of an index too, just to muddy the waters. Hm, true. But excluding those special cases all Postgres aggregate functions will be O(n) unless they're doing something very odd. None of the built-in functions (except min/max as noted) do anything odd like that. The reason way is because of the basic design of Postgres aggregate functions. They are fed every tuple one by one and have to keep their state in a single variable. Most of the aggregate functions like count(*) etc just keep a static non-growing state and the state transition function is a simple arithmetic operation which is O(1). So the resulting operation is O(n). Actually one exception would be something like CREATE AGGREGATE array_agg(anyelement) (SFUNC = array_append, STYPE = anyarray, INITCOND='{}'); Since the state variable has to keep accumulating more and more stuff the array_append becomes more and more expensive (it has to generate a new array so it has to copy the existing stuff). So actually it woul dbe O(n^2). The only builtin aggregate which looks like it falls in this category would be xmlagg() -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O on select count(*)
"Luke Lonergan" <[EMAIL PROTECTED]> writes: > BTW we¹ve removed HINT bit checking in Greenplum DB and improved the > visibility caching which was enough to provide performance at the same level > as with the HINT bit optimization, but avoids this whole ³write the data, > write it to the log also, then write it again just for good measure² > behavior. > > For people doing data warehousing work like the poster, this Postgres > behavior is miserable. It should be fixed for 8.4 for sure (volunteers?) For people doing data warehousing I would think the trick would be to do something like what we do to avoid WAL logging for tables created in the same transaction. That is, if you're loading a lot of data at the same time then all of that data is going to be aborted or committed and that will happen at the same time. Ideally we would find a way to insert the data with the hint bits already set to committed and mark the section of the table as being only provisionally extended so other transactions wouldn't even look at those pages until the transaction commits. This is similar to the abortive attempt to have the abovementioned WAL logging trick insert the records pre-frozen. I recall there were problems with that idea though but I don't recall if they were insurmountable or just required more work. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Understanding histograms
"Tom Lane" <[EMAIL PROTECTED]> writes: > Right. As a matter of policy we never estimate less than one matching > row; and I've seriously considered pushing that up to at least two rows > except when we see that the query condition matches a unique constraint. > You can get really bad join plans from overly-small estimates. This is something that needs some serious thought though. In the case of partitioned tables I've seen someone get badly messed up plans because they had a couple hundred partitions each of which estimated to return 1 row. In fact of course they all returned 0 rows except the correct partition. (This was in a join so no constraint exclusion) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
"Pavan Deolasee" <[EMAIL PROTECTED]> writes: >> Any special guideline to follow to make HOT working?? >> > > You can do couple of things to benefit from HOT. > > 1. HOT addresses a special, but common case where UPDATE operation > does not change any of the index keys. So check if your UPDATE changes > any of the index keys. If so, see if you can avoid having index > involving that column. Of course, I won't advocate dropping an index > if it would drastically impact your frequently run queries. > > 2. You may leave some free space in the heap (fillfactor less than > 100). My recommendation would be to leave space worth of one row or > slightly more than that to let first UPDATE be an HOT update. > Subsequent UPDATEs in the page may reuse the dead row created by > earlier UPDATEs. > > 3. Avoid any long running transactions. Perhaps we should put this list in the FAQ. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Benchmarks WAS: Sun Talks about MySQL
"Josh Berkus" <[EMAIL PROTECTED]> writes: > Greg, > >> What I was referring to by "passing" TPC-E was the criteria for a conformant >> benchmark run. TPC-C has iirc, only two relevant criteria: "95th percentile >> response time < 5s" and "average response time < 95th percentile response >> time". You can pass those even if 1 transaction in 20 takes 10-20s which is >> more than enough to cover checkpoints and other random sources of >> inconsistent >> performance. > > We can do this now. I'm unhappy because we're at about 1/4 of Oracle > performance, but we certainly pass -- even with 8.2. We certainly can pass TPC-C. I'm curious what you mean by 1/4 though? On similar hardware? Or the maximum we can scale to is 1/4 as large as Oracle? Can you point me to the actual benchmark runs you're referring to? But I just made an off-hand comment that I doubt 8.2 could pass TPC-E which has much more stringent requirements. It has requirements like: the throughput computed over any period of one hour, sliding over the Steady State by increments of ten minutes, varies from the Reported Throughput by no more than 2% -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [pgsql-advocacy] Benchmarks WAS: Sun Talks about MySQL
"Josh Berkus" <[EMAIL PROTECTED]> writes: >> I think TPC-E will make both of these major improvements much more important. >> I suspect it would be hard to get 8.2 to even pass TPC-E due to the >> checkpoint >> dropouts. > > You'd be surprised, then. We're still horribly, horribly lock-bound on TPC-E; > on anything over 4 cores lock resolution chokes us to death. See Jignesh's > and > Paul's various posts about attempts to fix this. Most of those posts have been about scalability issues with extremely large numbers of sessions. Those are interesting too and they may be limiting our results in benchmarks which depend on such a configuration (which I don't think includes TPC-E, but the benchmark Jignesh has been writing about is some Java application benchmark which may be such a beast) but they don't directly relate to whether we're "passing" TPC-E. What I was referring to by "passing" TPC-E was the criteria for a conformant benchmark run. TPC-C has iirc, only two relevant criteria: "95th percentile response time < 5s" and "average response time < 95th percentile response time". You can pass those even if 1 transaction in 20 takes 10-20s which is more than enough to cover checkpoints and other random sources of inconsistent performance. TPC-E has more stringent requirements which explicitly require very consistent response times and I doubt 8.2 would have been able to pass them. So the performance limiting factors whether they be i/o, cpu, lock contention, or whatever don't even come into play. We wouldn't have any conformant results whatsoever, not even low values limited by contention. 8.3 however should be in a better position to pass. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] off-topic: SPAM
"Albe Laurenz" <[EMAIL PROTECTED]> writes: >> The email message sent to [EMAIL PROTECTED] , >> pgsql-performance@postgresql.org requires a confirmation to be delivered. >> Please, answer this email informing the characters that you see in the >> image below. > > Could somebody remove the latter address from the list, please? Unfortunately that's not the address causing the problem. This is a particularly stupid spam filter which is just grabbing that address from your To line. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] mysterious difference in speed when combining two queries with OR
"Hans Ekbrand" <[EMAIL PROTECTED]> writes: > No, I just wanted to show the time differences, I haven't used join > before. Now that you have adviced me to, I have tried your suggestion > to rewrite B as a union and it works good! Just as fast as the A Query! You can even do better. If you know the two sets of mid are disjoint you can use UNION ALL. If not you could remove the two DISTINCTs as the UNION will take care of removing duplicates. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Group by more efficient than distinct?
"Francisco Reyes" <[EMAIL PROTECTED]> writes: > Is there any dissadvantage of using "group by" to obtain a unique list? > > On a small dataset the difference was about 20% percent. > > Group by > HashAggregate (cost=369.61..381.12 rows=1151 width=8) (actual > time=76.641..85.167 rows=2890 loops=1) HashAggregate needs to store more values in memory at the same time so it's not a good plan if you have a lot of distinct values. But the planner knows that and so as long as your work_mem is set to a reasonable size (keeping in mind each sort or other operation feels free to use that much itself even if there are several in the query) and the rows estimate is reasonable accurate -- here it's mediocre but not dangerously bad -- then if the planner is picking it it's probably a good idea. I'm not sure but I think there are cases where the DISTINCT method wins too. This is basically a bug, in an ideal world both queries would generate precisely the same plans since they're equivalent. It's just not a high priority since we have so many more interesting improvements competing for time. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] shared_buffers performance
"Tom Lane" <[EMAIL PROTECTED]> writes: > Gregory Stark <[EMAIL PROTECTED]> writes: >> The transition domain where performance drops dramatically as the database >> starts to not fit in shared buffers but does still fit in filesystem cache. > > It looks to me like the knee comes where the DB no longer fits in > filesystem cache. That does seem to make a lot more sense. I think I misread the units of the size of the accounts table. Reading it again it seems to be in the 1.5G-2G range for the transition which with indexes and other tables might be starting to stress the filesystem cache -- though it still seems a little low. I think if I squint I can see another dropoff at the very small scaling numbers. That must be the point where the database is comparable to the shared buffers size. Except then I would expect the green and blue curves to be pushed to the right a bit rather than just havin a shallower slope. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] shared_buffers performance
"Gaetano Mendola" <[EMAIL PROTECTED]> writes: > The following graph reports the results: > > http://img84.imageshack.us/my.php?image=totalid7.png That's a *fascinating* graph. It seems there are basically three domains. The small domain where the database fits in shared buffers -- though actually this domain seems to hold until the accounts table is about 1G so maybe it's more that the *indexes* fit in memory. Here larger shared buffers do clearly win. The transition domain where performance drops dramatically as the database starts to not fit in shared buffers but does still fit in filesystem cache. Here every megabyte stolen from the filesystem cache makes a *huge* difference. At a scale factor of 120 or so you're talking about a factor of 4 between each of the shared buffer sizes. The large domain where the database doesn't fit in filesystem cache. Here it doesn't make a large difference but the more buffers duplicated between postgres and the filesystem cache the lower the overall cache effectiveness. If we used something like either mmap or directio to avoid the double buffering we would be able to squeeze these into a single curve, as well as push the dropoff slightly to the right. In theory. In practice it would depend on the OS's ability to handle page faults efficiently in the mmap case, and our ability to do read-ahead and cache management in the directio case. And it would be a huge increase in complexity for Postgres and a push into a direction which isn't our "core competency". We might find that while in theory it should perform better our code just can't keep up with Linux's and it doesn't. I'm curious about the total database size as a for each of the scaling factors as well as the total of the index sizes. And how much memory Linux says is being used for filesystem buffers. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance increase with elevator=deadline
"Albe Laurenz" <[EMAIL PROTECTED]> writes: > This refers to the performance problem reported in > http://archives.postgresql.org/pgsql-performance/2008-04/msg00052.php > > After some time of trial and error we found that changing the I/O scheduling > algorithm to "deadline" improved I/O performance by a factor 4 (!) for > this specific load test. What was the algorithm before? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Partitioned tables - planner wont use indexes
"kevin kempter" <[EMAIL PROTECTED]> writes: > that the planner wants to do a sequential scan on each partition. We do have > "constraint_elimination = on" set in the postgresql.conf file. "constraint_exclusion" btw. > myDB=# explain SELECT min(logdate) FROM part_master; Er, yeah. Unfortunately this is just not a kind of query our planner knows how to optimize when dealing with a partitioned table... yet. There are several different pieces missing to make this work. There's some hope some of them might show up for 8.4 but no guarantees. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PG writes a lot to the disk
"Laurent Raufaste" <[EMAIL PROTECTED]> writes: > All this make me think that PG was setting some bit on every row it > used, which caused this massive write activity (3MB/s) in the table > files. I'm talking about approx. 50 SELECT per second for a single > server. Well that's true it does. But only once per row. So analyze would have set the bit on every row. You could do the same thing with something ligter like "select count(*) from ". Tom's thinking was that you would only expect a high update rate for a short time until all those bits were set. Slony's inserts, updates, and deletes count as updates to the table as well. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL NetApp and NFS
"Dawid Kuroczko" <[EMAIL PROTECTED]> writes: > It is also possible to present block devices from NetApp over iSCSI or FC > (I am not sure about licensing model though). You get all the goodies > like thin provisioning (only non-zero blocks are allocated), snapshots and > all, but you see it as a block device. Works fine. Note that Postgres doesn't expect to get "out of space" errors on writes specifically because it pre-allocates blocks. So this "thin provisioning" thing sounds kind of dangerous. > It is also worth to mention that NetApp utilizes somewhat "copy on write" > write strategy -- so whenever you modify a block, new version of the block > is written on its WAFL filesystem. In practical terms it is quite resilient > to > random writes (and that read performance is not stellar ;)). Again, Postgres goes to some effort to keep its reads sequential. So this sounds like it destroys that feature. Random writes don't matter so much because Postgres has its WAL which it writes sequentially. Writes to data files aren't in the critical path and can finish after the transaction is committed. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] What is the best way to storage music files in Postgresql
<[EMAIL PROTECTED]> writes: > On Tue, 18 Mar 2008, Gregory Stark wrote: > >> You can have as many parity drives as you want with RAID 5 too. > > you can? I've never seen a raid 5 setup with more then a single parity dirve > (or even the option of having more then one drives worth of redundancy). you > can have hot-spare drives, but thats a different thing. > > what controller/software lets you do this? Hm, some research shows I may have completely imagined this. I don't see why you couldn't but I can't find any evidence that this feature exists. I could have sworn I've seen it before though. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] What is the best way to storage music files in Postgresql
<[EMAIL PROTECTED]> writes: > On Wed, 19 Mar 2008, Andrej Ricnik-Bay wrote: > >> >> On 18/03/2008, Craig Ringer <[EMAIL PROTECTED]> wrote: >>> Isn't a 10 or 24 spindle RAID 5 array awfully likely to encounter a >>> double disk failure (such as during the load imposed by rebuild onto a >>> spare) ? > > that's why you should use raid6 (allowing for dual failures) You can have as many parity drives as you want with RAID 5 too. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] best way to run maintenance script
"Vinubalaji Gopal" <[EMAIL PROTECTED]> writes: > On Fri, 2008-03-14 at 18:37 -0700, Tom Lane wrote: >> That's only a little bit better. Read about all the bug fixes you're > > Sure - will eventually upgrade it sometime - but it has to wait for > now :( Waiting for one of those bugs to bite you is a bad plan. We're not talking about an upgrade to 8.1, 8.2, or 8.3. We're talking about taking bug-fixes and security fixes for the release you're already using. Normally it's just a shutdown and immediate restart. There are exceptions listed in the 8.0.6 release notes which would require a REINDEX but they don't affect most people. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] count * performance issue
"Tom Lane" <[EMAIL PROTECTED]> writes: > Well, scanning an index to get a count might be significantly faster > than scanning the main table, but it's hardly "instantaneous". It's > still going to take time proportional to the table size. Hm, Mark's comment about bitmap indexes makes that not entirely true. A bitmap index can do RLE compression which makes the relationship between the size of the table and the time taken to scan the index more complex. In the degenerate case where there are no concurrent updates (assuming you can determine that quickly) it might actually be constant time. > Unless they keep a central counter of the number of index entries; > which would have all the same serialization penalties we've talked > about before... Bitmap indexes do in fact have concurrency issues -- arguably they're just a baroque version of this central counter in this case. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 7 hrs for a pg_restore?
"Chris" <[EMAIL PROTECTED]> writes: >> When I said "obfuscating" I meant it. I'm pretty familiar with sh scripting >> and I'm not even sure what the && behaviour would do. > > It chains commands together so if the first fails the second doesn't happen. I meant in this case, not in general. That is, does it introduce a subshell? Sh traditionally has to introduce to implement some of the logical control and pipe operators. I'm not sure if a simple && is enough but often it's surprising how quickly that happens. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] 7 hrs for a pg_restore?
"Tom Lane" <[EMAIL PROTECTED]> writes: > Erik Jones <[EMAIL PROTECTED]> writes: >> On Feb 20, 2008, at 8:14 AM, Gregory Stark wrote: >>> I would suggest leaving out the && which only obfuscate what's >>> going on here. >>> >>> PGOPTIONS=... pg_restore ... >>> >>> would work just as well and be clearer about what's going on. > >> Right, that's just an unnecessary habit of mine. > > Isn't that habit outright wrong? ISTM that with the && in there, > what you're doing is equivalent to > > PGOPTIONS=whatever > pg_restore ... > > This syntax will set PGOPTIONS for the remainder of the shell session, > causing it to also affect (say) a subsequent psql invocation. Which is > exactly not what is wanted. When I said "obfuscating" I meant it. I'm pretty familiar with sh scripting and I'm not even sure what the && behaviour would do. On at least some shells I think the && will introduce a subshell. In that case the variable would not continue. In bash I think it would because bash avoids a lot of subshells that would otherwise be necessary. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] 7 hrs for a pg_restore?
"Douglas J Hunley" <[EMAIL PROTECTED]> writes: > On Tuesday 19 February 2008 16:32:02 Erik Jones wrote: >> pg_restore is a postgres client app that uses libpq to connect and, >> thus, will pick up anything in your $PGOPTIONS env variable. So, >> >> PGOPTONS="-c maintenance_work_mem=512MB" && pg_restore > > now that's just plain cool > > /me updates our wiki I would suggest leaving out the && which only obfuscate what's going on here. PGOPTIONS=... pg_restore ... would work just as well and be clearer about what's going on. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Optimizing No matching record Queries
"Pallav Kalva" <[EMAIL PROTECTED]> writes: > This index would fix this problem but in general I would like to know what if > there are queries where it does "index scan backwards" and there is no "order > by clause" and the query is still bad ? Would there be a case like that or the > planner uses index scan backwards only when use order by desc also. I think you're oversimplifying. Basically you were asking the planner for the most recent record for a given user. The planner had the choice either of a) going through all the records for a given user and picking the most recent, or b) scanning the records from most recent to oldest and looking for the given user. It was a choice between two evils. If there are a lot of records for the user then a) will be bad since it has to scan all of them to find the most recent and if there are no records for the user then b) will be bad because it'll have to go through all of the records to the beginning of time. The suggested index lets it scan the records for the given user from most recent to oldest without seeing any records for any other user. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Optimizing No matching record Queries
"Stephen Denne" <[EMAIL PROTECTED]> writes: > Pallav Kalva asked > ... >> and listing0_.fklistingsourceid=5525 > ... >> order by listing0_.entrydate desc limit 10; > >>-> Index Scan Backward using idx_listing_entrydate on >> listing listing0_ (cost=0.00..781557.28 rows=5118 width=107) (actual >> time=2113544.412..2113544.412 rows=0 loops=1) >> Filter: (fklistingsourceid = 5525) > > Would it help to have a combined index on fklistingsourceid, entrydate? I think that would help. You already have a ton of indexes, you might consider whether all your queries start with a listingsourceid and whether you can have that as a prefix on the existing index. Another thing to try is raising the stats target on fklistingsourceid and/or entrydate. The estimate seems pretty poor. It could just be that the distribution is highly skewed which is a hard case to estimate correctly. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Benchmark Data requested
"Jignesh K. Shah" <[EMAIL PROTECTED]> writes: > Also very important unless you are running the UPDATE FUNCTIONS which are > separate queries, all these Q1-Q22 Queries are pure "READ-ONLY" queries. > Traditionally I think PostgreSQL does lack "READ-SPEED"s specially since it is > bottlenecked by the size of the reads it does (BLOCKSIZE). Major database > provides multi-block parameters to do multiple of reads/writes in terms of > blocksizes to reduce IOPS and also for read only they also have READ-AHEAD or > prefetch sizes which is generally bigger than multi-block or extent sizes to > aid reads. Note that all of these things are necessitated by those databases using direct i/o of some form or another. The theory is that PostgreSQL doesn't have to worry about these things because the kernel is taking care of it. How true that is is a matter of some debate and it varies from OS to OS. But it's definitely true that the OS will do read-ahead for sequential reads, for example. Incidentally we found some cases that Solaris was particularly bad at. Is there anybody in particular that would be interested in hearing about them? (Not meant to be a knock on Solaris, I'm sure there are other cases Linux or BSD handle poorly too) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Benchmark Data requested
"Jignesh K. Shah" <[EMAIL PROTECTED]> writes: > Then for the power run that is essentially running one query at a time should > essentially be able to utilize the full system (specially multi-core systems), > unfortunately PostgreSQL can use only one core. (Plus since this is read only > and there is no separate disk reader all other processes are idle) and system > is running at 1/Nth capacity (where N is the number of cores/threads) Is the whole benchmark like this or is this just one part of it? Is the i/o system really able to saturate the cpu though? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] RAID arrays and performance
"Kevin Grittner" <[EMAIL PROTECTED]> writes: > >>> On Tue, Jan 29, 2008 at 9:52 AM, in message > <[EMAIL PROTECTED]>, Gregory Stark <[EMAIL PROTECTED]> > wrote: > > > I got this from a back-of-the-envelope calculation which now that I'm trying > > to reproduce it seems to be wrong. Previously I thought it was n(n+1)/2 or > > about n^2/2. So at 16 I would have expected about 128 pending i/o requests > > before all the drives could be expected to be busy. > > That seems right to me, based on the probabilities of any new > request hitting an already-busy drive. > > > Now that I'm working it out more carefully I'm getting that the expected > > number of pending i/o requests before all drives are busy is > > n + n/2 + n/3 + ... + n/n > > What's the basis for that? Well consider when you've reached n-1 drives; the expected number of requests before you hit the 1 idle drive remaining out of n would be n requests. When you're at n-2 the expected number of requests before you hit either of the two idle drives would be n/2. And so on. The last term of n/n would be the first i/o when all the drives are idle and you obviously only need one i/o to hit an idle drive. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] RAID arrays and performance
"Matthew" <[EMAIL PROTECTED]> writes: > On Tue, 29 Jan 2008, Gregory Stark wrote: >>> This was with 8192 random requests of size 8192 bytes from an 80GB test >>> file. >>> Unsorted requests ranged from 1.8 MB/s with no prefetching to 28MB/s with >>> lots >>> of prefetching. Sorted requests went from 2.4MB/s to 38MB/s. That's almost >>> exactly 16x improvement for both, and this is top of the line hardware. >> >> Neat. The curves look very similar to mine. I also like that with your >> hardware the benefit maxes out at pretty much exactly where I had >> mathematically predicted they would ((stripe size)^2 / 2). > > Why would that be the case? Does that mean that we can select a stripe size of > 100GB and get massive performance improvements? Doesn't seem logical to me. To > me, it maxes out at 16x speed because there are 16 discs. Sorry, I meant "number of drives in the array" not number of bytes. So with 16 drives you would need approximately 128 random pending i/o operations to expect all drives to be busy at all times. I got this from a back-of-the-envelope calculation which now that I'm trying to reproduce it seems to be wrong. Previously I thought it was n(n+1)/2 or about n^2/2. So at 16 I would have expected about 128 pending i/o requests before all the drives could be expected to be busy. Now that I'm working it out more carefully I'm getting that the expected number of pending i/o requests before all drives are busy is n + n/2 + n/3 + ... + n/n which is actually n * H(n) which is approximated closely by n * log(n). That would predict that 16 drives would actually max out at 44.4 pending i/o requests. It would predict that my three-drive array would max out well below that at 7.7 pending i/o requests. Empirically neither result seems to match reality. Other factors must be dominating. > Amusingly, there appears to be a spam filter preventing my message (with its > image) getting through to the performance mailing list. This has been plaguing us for a while. When we figure out who's misconfigured system is doing it I expect they'll be banned from the internet for life! -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] RAID arrays and performance
"Matthew" <[EMAIL PROTECTED]> writes: > On Tue, 4 Dec 2007, Gregory Stark wrote: >> FWIW I posted some numbers from a synthetic case to pgsql-hackers >> >> http://archives.postgresql.org/pgsql-hackers/2007-12/msg00088.php >... > This was with 8192 random requests of size 8192 bytes from an 80GB test file. > Unsorted requests ranged from 1.8 MB/s with no prefetching to 28MB/s with lots > of prefetching. Sorted requests went from 2.4MB/s to 38MB/s. That's almost > exactly 16x improvement for both, and this is top of the line hardware. Neat. The curves look very similar to mine. I also like that with your hardware the benefit maxes out at pretty much exactly where I had mathematically predicted they would ((stripe size)^2 / 2). > So, this is FYI, and also an added encouragement to implement fadvise > prefetching in some form or another. How's that going by the way? I have a patch which implements it for the low hanging fruit of bitmap index scans. it does it using an extra trip through the buffer manager which is the least invasive approach but not necessarily the best. Heikki was pushing me to look at changing the buffer manager to support doing asynchronous buffer reads. In that scheme you would issue a ReadBufferAsync which would give you back a pinned buffer which the scan would have to hold onto and call a new buffer manager call to complete the i/o when it actually needed the buffer. The ReadBufferAsync call would put the buffer in some form of i/o in progress. On systems with only posix_fadvise ReadBufferAsync would issue posix_fadvise and ReadBufferFinish would issue a regular read. On systems with an effective libaio the ReadBufferAsync could issue the aio operation and ReadBufferFinish could then do a aio_return. The pros of such an approach would be less locking and cpu overhead in the buffer manager since the second trip would have the buffer handle handy and just have to issue the read. The con of such an approach would be the extra shared buffers occupied by buffers which aren't really needed yet. Also the additional complexity in the buffer manager with the new i/o in progress state. (I'm not sure but I don't think we get to reuse the existing i/o in progress state.) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Performance issues migrating from 743 to 826
"Matthew Lunnon" <[EMAIL PROTECTED]> writes: > In this case the query takes 6.037 ms to run on 862 and 2.332 to run on 743. The difference between 2ms and 6ms is pretty negligable. A single context switch or disk cache miss could throw the results off by that margin in either direction. But what plan does 7.4.3 come up with if you set enable_hashjoins = off? I'm curious whether it comes up with the same nested loops plan as 8.2 and what cost it says it has. I think you need to find queries which take longer to have any reliable performance comparisons. Note that the configuration parameters here aren't the same at all, it's possible the change of effective_cache_size from 800k to 2GB is what's changing the cost estimation. I seem to recall a change in the arithmetic for calculatin Nested loop costs too which made it more aggressive in estimating cache effectiveness. Incidentally, default_statistics_target=1000 is awfully aggressive. I found in the past that that caused the statistics table to become much larger and much slower to access. It may have caused some statistics to be toasted or it may have just been the sheer volume of data present. It will also make your ANALYZEs take a lot longer. I would suggest trying 100 first and incrementally raising it rather than jumping straight to 1000. And preferably only on the columns which really matter. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] More shared buffers causes lower performances
"Greg Smith" <[EMAIL PROTECTED]> writes: > The worst time people can run into a performance > regression is when they're running a popular benchmarking tool. Hm, perhaps pg_bench should do a "show debug_assertions" and print a warning if the answer isn't "off". We could encourage other benchmark software to do something similar. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] More shared buffers causes lower performances
"Alvaro Herrera" <[EMAIL PROTECTED]> writes: > Tom Lane escribió: > >> Currently the docs say that --enable-cassert >> >> Enables assertion checks in the server, which test for >> many cannot happen conditions. This is invaluable for >> code development purposes, but the tests slow things down a little. >> >> Maybe we ought to put that more strongly --- s/a little/significantly/, >> perhaps? > > I don't think it will make any difference, because people don't read > configure documentation. They read configure --help. Fwiw I think you're all getting a bit caught up in this one context. While the slowdown is significant when you take out the stopwatch, under normal interactive use you're not going to notice your queries being especially slow. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] pg_dump performance
"Jared Mauch" <[EMAIL PROTECTED]> writes: > pg_dump is utilizing about 13% of the cpu and the > corresponding postgres backend is at 100% cpu time. > (multi-core, multi-cpu, lotsa ram, super-fast disk). >... > pg8.3(beta) with the following variances from default > > checkpoint_segments = 300# in logfile segments, min 1, 16MB each > effective_cache_size = 512MB# typically 8KB each > wal_buffers = 128MB# min 4, 8KB each > shared_buffers = 128MB# min 16, at least max_connections*2, 8KB > each > work_mem = 512MB # min 64, size in KB Fwiw those are pretty unusual numbers. Normally work_mem is much smaller than shared_buffers since you only need one block of memory for shared buffers and work_mem is for every query (and every sort within those queries). If you have ten queries running two sorts each this setting of work_mem could consume 5GB. Raising shared buffers could improve your pg_dump speed. If all the data is in cache it would reduce the time spend moving data between filesystem cache and postgres shared buffers. What made you raise wal_buffers so high? I don't think it hurts but that's a few orders of magnitude higher than what I would expect to help. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Dual core Opterons beating quad core Xeons?
"Joshua D. Drake" <[EMAIL PROTECTED]> writes: > It is pretty much common knowledge that I think we have too much "common knowledge". -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Dual core Opterons beating quad core Xeons?
"Stuart Bishop" <[EMAIL PROTECTED]> writes: > For pgbench (PG 8.2 running Ubuntu), the Opteron is getting about 6x TPS > over the Xeon (3000+ TPS on Opteron vs ~500 on Xeon). Things get a little > better for Xeon with PG 8.3 (570-540 TPS). There was a problem in the past which affected Xeons. But I thought it had been mostly addressed. Xeons are (or were? these things are always changing) more sensitive to interprocess contention due to their memory architecture though. What are you actually doing in these transactions? Are they read-only? If not is fsync=off (which you don't want if you care about your data but you do if you're trying to benchmark the cpu). Are the crappy disks *identical* crappy disks? If they have different controllers or different drives (or different versions of the OS) then you might be being deceived by write caching on one set and not the other. If they're not read-only transactions and fsync=on then the TPS of 3000+ is not credible and this is likely. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Optimising a query
> Also, try replacing the DISTINCT with GROUP BY. The code path for DISTINCT > unfortunately needs a bit of cleaning up and isn't exactly equivalent to GROUP > BY. In particular it doesn't support hash aggregates which, if your work_mem > is large enough, might work for you here. Sorry, strike that last suggestion. I was looking at the plan and forgot that the query had DISTINCT ON. It is possible to replace DISTINCT ON with GROUP BY but it's not going to be faster than the DISTINCT ON case since you'll need the sort anyways. Actually it's possible to do without the sort if you write some fancy aggregate functions but for this large a query that's going to be awfully complex. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Optimising a query
"Richard Huxton" <[EMAIL PROTECTED]> writes: > Paul Lambert wrote: > >> " -> Sort (cost=30197.98..30714.85 rows=206748 width=16) (actual >> >> time=5949.691..7018.931 rows=206748 loops=1)" >> "Sort Key: dealer_id, year_id, subledger_id, account_id" >> "Sort Method: external merge Disk: 8880kB" > Before that though, try issuing a "SET work_mem = '9MB'" before running your > query. If that doesn't change the plan step up gradually. You should be able > to > get the sort stage to happen in RAM rather than on disk (see "Sort Method" > above). FWIW you'll probably need more than that. Try something more like 20MB. Also, note you can change this with SET for just this connection and even just this query and then reset it to the normal value (or use SET LOCAL). You don't have to change it in the config file and restart the whole server. Also, try replacing the DISTINCT with GROUP BY. The code path for DISTINCT unfortunately needs a bit of cleaning up and isn't exactly equivalent to GROUP BY. In particular it doesn't support hash aggregates which, if your work_mem is large enough, might work for you here. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] libgcc double-free, backend won't die
"Craig James" <[EMAIL PROTECTED]> writes: > Gregory Stark wrote: > >> And the only reason to do that would be to work around one bug in one small >> range of glibc versions. If you're going to use a multi-threaded library >> (which isn't very common since it's hard to do safely for all those other >> reasons) surely using a version of your OS without any thread related bugs is >> a better idea. > > You're jumping ahead. This problem has not been accurately diagnosed yet. It > could be that the pthreads issue is completely misleading everyone, and in > fact > there is a genuine memory corruption going on here. Or not. We don't know yet. > I have made zero progress fixing this problem. Well, no that would be you jumping ahead then... You proposed Postgres changing the way it handles threaded libraries based on Tom's suggestion that your problem was something like the glibc problem previously found. My comment was based on the known glibc problem. From what you're saying it's far from certain that the problem would be fixed by changing Postgres's behaviour in the way you proposed. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] libgcc double-free, backend won't die
"Tom Lane" <[EMAIL PROTECTED]> writes: > James Mansion <[EMAIL PROTECTED]> writes: >> Is there any particular reason not to ensure that any low-level >> threading support in libc is enabled right >> from the get-go, as a build-time option? > > Yes. > 1) It's of no value to us > 2) On many platforms there is a nonzero performance penalty And the only reason to do that would be to work around one bug in one small range of glibc versions. If you're going to use a multi-threaded library (which isn't very common since it's hard to do safely for all those other reasons) surely using a version of your OS without any thread related bugs is a better idea. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly