[PERFORM] Turn correlated in subquery into join
Look like the mysql people found a subquery that postgresql doesn't handle as good as possible: http://s.petrunia.net/blog/ Is there some deeper issue here that I fail to see or is it simply that it hasn't been implemented but is fairly straigt forward? In the link above they do state that it isn't a very common case anyway. /Dennis - Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow Postgresql server
Jason Lustig skrev: and work_mem to 8096. What would cause the computer to only use such a small percentage of the CPU, with more than half of it waiting on I/O requests? Do your webpages write things to the database on each connect? Maybe it do a bunch of writes each individually commited? For every commit pg will wait for the data to be written down to the disk platter before it move on. So if you do several writes you want to do them in one transaction so you only need one commit. /Dennis ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Upgraded to 8.2.3 --- still having performance issues
Carlos Moreno skrev: The system does very frequent insertions and updates --- the longest table has, perhaps, some 20 million rows, and it's indexed (the primary key is the combination of two integer fields). This longest table only has inserts (and much less frequent selects), at a peak rate of maybe one or a few insertions per second. One or a few inserts per second doesn't sound like that much. I would have expected it to work. If you can you might want to group several inserts into a single transaction. A standard hint is also to move the WAL onto its own disk. Or get a disk controller with battery backed up ram. But it's hard to say from your description what the bottleneck is and thus hard to give any advice. Any help/tips/guidance in troubleshooting this issue? It will be much appreciated! You could try to find specific queries that are slow. Pg can for example log queries for you that run for longer than X seconds. /Dennis ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Seqscan/Indexscan still a known issue?
Carlos Moreno skrev: When I force it via set enable_seqscan to off, the index scan takes about 0.1 msec (as reported by explain analyze), whereas For the time being, I'm using an explicit enable_seqscan off in the client code, before executing the select. But I wonder: Is this still an issue, or has it been solved in the latest version? For most queries it has never been an issue. Every once in a while there is a query that the planner makes a non-optimal plan for, but it's not that common. In general the optimizer has improved with every new version of pg. Almost everyone I've talked to that has upgraded has got a faster database tham before. It was like that for 7.4-8.0, for 8.0-8.1 and for 8.1-8.2. So in your case going from 7.4-8.2 is most likely going to give a speedup (especially if you have some queries that isn't just simple primary key lookups). In your case it's hard to give any advice since you didn't share the EXPLAIN ANALYZE output with us. I'm pretty sure it's possible to tune pg so it makes the right choice even for this query of yours but without the EXPLAIN ANALYZE output we would just be guessing anyway. If you want to share it then it might be helpful to show the plan both with and without seqscan enabled. How often do you run VACUUM ANALYZE; on the database? /Dennis ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Worse perfomance on 8.2.0 than on 7.4.14
Rolf Østvik skrev: I have a simple query which uses 32ms on 7.4.14 and 1015ms on 8.2.0. I guess 7.4.14 creates a better execution plan than 8.2.0 for this query but Try to turn off planner options in 8.2 to make it generate the same plan as 7.4. Then run EXPLAIN ANALYZE on that query that generate the same plan as in 7.4 and we can compare the costs and maybe understand what go wrong. For example, try set enable_hashjoin to false; set enable_bitmapscan to false; but you might need to turn off more things to get it to generate the 7.4 plan. /Dennis ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] How import big amounts of data?
On Thu, 29 Dec 2005, Arnau wrote: Which is the best way to import data to tables? I have to import 9 rows into a column and doing it as inserts takes ages. Would be faster with copy? is there any other alternative to insert/copy? Wrap the inserts inside a BEGIN/COMMIT block and it will be a lot faster. Copy is even faster, but for just 9 rows I wouldn't bother. -- /Dennis Björklund ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] 8.1 iss
On Sun, 6 Nov 2005, PostgreSQL wrote: SELECT v_barcode, count(v_barcode) FROM lead GROUP BY v_barcode HAVING count(*) 1; This is a dual Opteron box with 16 Gb memory and a 3ware SATA raid runing 64bit SUSE. Something seems badly wrong. GroupAggregate (cost=9899282.83..10285434.26 rows=223858 width=15) Filter: (count(*) 1) - Sort (cost=9899282.83..9994841.31 rows=38223392 width=15) Sort Key: v_barcode - Seq Scan on lead (cost=0.00..1950947.92 rows=38223392 width=15) What do the plan look like in 8.0? Since it's so much faster I assume you get a different plan. shared_buffers = 5 work_mem = 16384 maintenance_work_mem = 16384 max_fsm_pages = 10 max_fsm_relations = 5000 wal_buffers = 32 checkpoint_segments = 32 effective_cache_size = 5 default_statistics_target = 50 The effective_cache_size is way too low, only 390M and you have a machine with 16G. Try bumping it to 100 (which means almost 8G, how nice it would be to be able to write 8G instead...). It could be set even higher but it's hard for me to know what else your memory is used for. I don't know if this setting will affect this very query, but it should have a positive effect on a lot of queries. work_mem also seems low, but it's hard to suggest a good value on it without knowing more about how your database is used. -- /Dennis Björklund ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Comparative performance
On Wed, 28 Sep 2005, Joe wrote: Before I post the EXPLAIN and the table schema I'd appreciate confirmation that this list is the appropriate forum. It is and and useful things to show are * the slow query * EXPLAIN ANALYZE of the query * the output of \d for each table involved in the query * the output of SHOW ALL; * The amount of memory the machine have The settings that are the most important to tune in postgresql.conf for performance is in my opinion; shared_buffers, effective_cache_size and (to a lesser extent) work_mem. -- /Dennis Björklund ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] How to improve Postgres performance
On Tue, 30 Aug 2005, Hemant Pandey wrote: So please tell me how can i improve database performance through configuration parameters. I had tried to change parameters in postgresql.conf file but of no avail. Now i am trying to Auto Vacuum, but don't know how to run Auto Vacuum. The most important part is that you need to run VACUUM ANALYZE regulary. Vacuum can be started each night in a cron job, started from pg_autovacuum when it thinks it's needed, or started in some other way. In any case, it has to be run whenever the data in the database have changed enough. The parameters in the config that is most important in my experience is effective_cache_size and shared_buffers. This is a text I like (it's for pg 7.4 but still useful): http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html -- /Dennis Björklund ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Need for speed
On Tue, 16 Aug 2005, Ulrich Wisser wrote: Still when several users are on line the reporting gets very slow. Queries can take more then 2 min. Could you show an exampleof such a query and the output of EXPLAIN ANALYZE on that query (preferably done when the database is slow). It's hard to say what is wrong without more information. -- /Dennis Björklund ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0
On Wed, 27 Jul 2005, Matthew Schumacher wrote: Then they do this to insert the token: INSERT INTO bayes_token ( id, token, spam_count, ham_count, atime ) VALUES ( ?, ?, ?, ?, ? ) ON DUPLICATE KEY UPDATE spam_count = GREATEST(spam_count + ?, 0), ham_count = GREATEST(ham_count + ?, 0), atime = GREATEST(atime, ?) Or update the token: UPDATE bayes_vars SET $token_count_update newest_token_age = GREATEST(newest_token_age, ?), oldest_token_age = LEAST(oldest_token_age, ?) WHERE id = ? I think the reason why the procedure was written for postgres was because of the greatest and least statements performing poorly. How can they perform poorly when they are dead simple? Here are 2 functions that work for the above cases of greatest: CREATE FUNCTION greatest_int (integer, integer) RETURNS INTEGER IMMUTABLE STRICT AS 'SELECT CASE WHEN $1 $2 THEN $2 ELSE $1 END;' LANGUAGE SQL; CREATE FUNCTION least_int (integer, integer) RETURNS INTEGER IMMUTABLE STRICT AS 'SELECT CASE WHEN $1 $2 THEN $1 ELSE $2 END;' LANGUAGE SQL; and these should be inlined by pg and very fast to execute. I wrote a function that should do what the insert above does. The update I've not looked at (I don't know what $token_count_update is) but the update looks simple enough to just implement the same way in pg as in mysql. For the insert or replace case you can probably use this function: CREATE FUNCTION insert_or_update_token (xid INTEGER, xtoken BYTEA, xspam_count INTEGER, xham_count INTEGER, xatime INTEGER) RETURNS VOID AS $$ BEGIN LOOP UPDATE bayes_token SET spam_count = greatest_int (spam_count + xspam_count, 0), ham_count = greatest_int (ham_count + xham_count, 0), atime = greatest_int (atime, xatime) WHERE id = xid AND token = xtoken; IF found THEN RETURN; END IF; BEGIN INSERT INTO bayes_token VALUES (xid, xtoken, xspam_count, xham_count, xatime); RETURN; EXCEPTION WHEN unique_violation THEN -- do nothing END; END LOOP; END; $$ LANGUAGE plpgsql; It's not really tested so I can't tell if it's faster then what you have. What it does do is mimic the way you insert values in mysql. It only work on pg 8.0 and later however since the exception handling was added in 8.0. -- /Dennis Björklund ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] ported application having performance issues
On Thu, 30 Jun 2005, John Mendenhall wrote: Our setting for effective_cache_size is 2048. random_page_cost = 4, effective_cache_size = 2048 time approximately 4500ms random_page_cost = 3, effective_cache_size = 2048 time approximately 1050ms random_page_cost = 3, effective_cache_size = 4096 time approximately 1025ms The decrease of random_page_cost to 3 caused the plan to work properly, using the lead_requests table as a join starting point and using the contacts index. The effective_cache_size still looks small. As a rule of tumb you might want effective_cache_size to be something like 1/2 or 2/3 of your total memory. I don't know how much you had, but effective_cache_size = 4096 is only 32M. shared_buffers and effective_cache_size is normally the two most important settings in my experience. -- /Dennis Björklund ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] planner picking more expensive plan
On Fri, 1 Jul 2005, Sam Mason wrote: The key thing with the query that Sam have is that if you turn off seqscan you get the first plan that run in 0.4ms and if seqscan is on the runtime is 27851ms. There are 100 way to make it select the seq scan, including rewriting the query to something more useful, tweaking different parameters and so on. The interesting part is that pg give the fast plan a cost of 202 and the slow a cost of 566141, but still it chooses the slow query unless seqscan is turned off (or some other tweak with the same effect). It know very well that the plan with the index scan will be much faster, it just don't manage to generate it unless you force it to. It makes you wonder if pg throws away some plans too early in the planning phase. Limit (cost=0.00..202.52 rows=10 width=8) (actual time=0.221..0.600 rows=10 loops=1) - Merge Left Join (cost=0.00..6628.30 rows=3302780 width=8) (actual time=0.211..0.576 rows=10 loops=1) Merge Cond: (outer.animalid = inner.animalid) - Index Scan using animals_pkey on animals a (cost=0.00..10198983.91 rows=3302780 width=8) (actual time=0.112..0.276 rows=10 loops=1) - Index Scan using movement_animal on movements m (cost=0.00..56642740.73 rows=3107737 width=8) (actual time=0.088..0.235 rows=10 loops=1) Filter: (mtypeid = 0) Total runtime: 0.413 ms Limit (cost=565969.42..566141.09 rows=10 width=8) (actual time=27769.047..27769.246 rows=10 loops=1) - Merge Right Join (cost=565969.42..57264070.77 rows=3302780 width=8) (actual time=27769.043..27769.228 rows=10 loops=1) Merge Cond: (outer.animalid = inner.animalid) - Index Scan using movement_animal on movements m (cost=0.00..56642740.73 rows=3107737 width=8) (actual time=0.022..0.154 rows=10 loops=1) Filter: (mtypeid = 0) - Sort (cost=565969.42..574226.37 rows=3302780 width=8) (actual time=27768.991..27769.001 rows=10 loops=1) Sort Key: a.animalid - Seq Scan on animals a (cost=0.00..77086.80 rows=3302780 width=8) (actual time=0.039..5620.651 rows=3303418 loops=1) Total runtime: 27851.097 ms Another thing to notice is that if one remove the Limit node then the situation is reversed and the plan that pg choose (with the Limit node) is the one with the lowest cost. The startup cost is however very high so combining that Merge Join with a Limit will of course produce something slow compared to the upper plan where the startup cost is 0.0. A stand alone test case would be nice, but even without the above plans are interesting. -- /Dennis Björklund ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] ETL optimization
On Thu, 23 Jun 2005, Bricklen Anderson wrote: iii. UNIQUE constraint on table t1. This didn't seem to perform too badly with fewer rows (preliminary tests), but as you'd expect, on error the whole transaction would roll back. Is it possible to skip a row if it causes an error, as opposed to aborting the transaction altogether? You don't need to roll back the whole transaction if you use savepoints or the exception features in pl/pgsql Take a look at this example: http://developer.postgresql.org/docs/postgres/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE -- /Dennis Björklund ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Needed: Simplified guide to optimal memory configuration
On Wed, 15 Jun 2005, Todd Landfried wrote: So, what I need is to be pointed to (or told) what are the best settings for our database given these memory configurations. What should we do? Maybe this will help: http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html NOTICE: shared_buffers is 256 This looks like it's way too low. Try something like 2048. -- /Dennis Björklund ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Query plan changes after pg_dump / pg_restore
On Thu, 9 Jun 2005 [EMAIL PROTECTED] wrote: I am continously encountering an issue with query plans that changes after a pg_dump / pg_restore operation has been performed. Have placed an export of the query, query plan etc. online at: http://213.173.234.215:8080/plan.htm in order to ensure it's still readable. There is not a major difference in time, so pg is at least not way off (225ms vs. 280ms). The estimated cost is however not very related to the runtime (117 vs 1389). What you have not showed is if the database is properly tuned. The output of SHOW ALL; could help explain a lot together with info of how much memory your computer have. The first thing that comes to mind to me is that you probably have not tuned shared_buffers and effective_cache_size properly (SHOW ALL would tell). -- /Dennis Björklund ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Query plan changes after pg_dump / pg_restore
On Thu, 9 Jun 2005, Jona wrote: It's the same (physical) server as well as the same PostGreSQL daemon, so yes. The only thing that can differ then is the statistics collected and the amount of dead space in tables and indexes (but since you both reindex and run vacuum full that should not be it). So comparing the statistics in the system tables is the only thing I can think of that might bring some light on the issue. Maybe someone else have some ideas. And as KL said, the effective_cache_size looked like it was way to small. With that setting bigger then pg should select index scans more often. It doesn't explain why the databases behave like they do now, but it might make pg select the same plan nevertheless. -- /Dennis Björklund ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] COPY vs INSERT
On Fri, 6 May 2005, Jim C. Nasby wrote: Has thought been given to supporting inserting multiple rows in a single insert? DB2 supported: INSERT INTO table VALUES( (1,2,3), (4,5,6), (7,8,9) ); I'm not sure how standard that is or if other databases support it. The sql standard include this, except that you can not have the outer (). So it should be INSERT INTO table VALUES (1,2,3), (4,5,6), (7,8,9); Do DB2 demand these extra ()? -- /Dennis Björklund ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] First set of OSDL Shared Mem scalability results, some
On Fri, 8 Oct 2004, Josh Berkus wrote: As you can see, the sweet spot appears to be between 5% and 10% of RAM, which is if anything *lower* than recommendations for 7.4! What recommendation is that? To have shared buffers being about 10% of the ram sounds familiar to me. What was recommended for 7.4? In the past we used to say that the worst value is 50% since then the same things might be cached both by pg and the os disk cache. Why do we excpect the shared buffer size sweet spot to change because of the new arc stuff? And why would it make it better to have bigger shared mem? Wouldn't it be the opposit, that now we don't invalidate as much of the cache for vacuums and seq. scan so now we can do as good caching as before but with less shared buffers. That said, testing and getting some numbers of good sizes for shared mem is good. -- /Dennis Björklund ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] 7.4 vs 7.3 ( hash join issue )
On Wed, 22 Sep 2004, Gaetano Mendola wrote: Limit (cost=10.21..10.21 rows=1 width=24) (actual time=0.885..0.885 rows=0 loops=1) Limit (cost=10.28..10.28 rows=1 width=24) (actual time=0.429..0.429 rows=0 loops=1) These estimated costs are almost the same, but the runtime differs a bit. This means that maybe you need to alter settings like random_page_cost, effective_cache and maybe some others to make the cost reflect the runtime better. Since the costs are so close to each other very small changes can make it choose the other plan. It's also very hard to make an estimate that is correct in all situations. That's why it's called an estimate after all. -- /Dennis Björklund ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] 7.4 vs 7.3 ( hash join issue )
On 22 Sep 2004, Greg Stark wrote: Actually this looks like it's arguably a bug to me. Why does the hash join execute the sequential scan at all? Shouldn't it also like the merge join recognize that the other hashed relation is empty and skip the sequential scan entirely? I'm not sure you can classify that as a bug. It's just that he in one of the plans started with the empty scan and bacause of that didn't need the other, but with the hash join it started with the table that had 16 rows and then got to the empty one. While I havn't checked, I assume that if it had started with the empty table there then it would have skipped the other. I don't know what criteria is used to select which part to start with when doing a hash join. Looks like it started with the one that had the highest estimate of rows here, doing it the other way around might be a good idea because you in some cases are lucky to find an empty scans and can omit the other. The above are just observations of the behaviour, I've not seen the source at all. -- /Dennis Björklund ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres
On Sat, 11 Sep 2004, Mark Cotner wrote: There are no performance concerns with MySQL, but it would benefit greatly from stored procedures, views, etc. It is a very large rolling data warehouse that inserts about 4.5 million rows every 2 hours and subsequently rolls this data off the back end of a 90 day window. While it is impossible to know without testing, postgresql has the benefit of readers and writers that does not block each other. So in situations where you do lots of concurrent inserts and selects postgresql should behave well. Merge table definition equivalent. We use these extensively. As far as I can tell a merge table in mysql is the same as a view over a number of unions of other tables. And possibly a rule that defines how inserts will be done if you do inserts in the merged table. Merge table equivalent with all tables containing over 100M rows(and about 40 columns, some quite wide) will need to do index scans in at least 5 seconds(MySQL currently does 2, but we can live with 5) and return ~200 rows. Since each table that are merged will have it's own index the speed should be proportional to the number of tables. Index scans in them self are very fast, and of you have 30 tables you need 30 index scans. Also, are you sure you really need merge tables? With pg having row locks and mvcc, maybe you could go for a simpler model with just one big table. Often you can also combine that with partial indexes to get a smaller index to use for lots of your queries. Thoughts, suggestions? I see nothing in what you have written that indicates that pg can not do the job, and do it well. It's however very hard to know exactly what is the bottleneck before one tries. There are lots of cases where people have converted mysql applications to postgresql and have gotten a massive speedup. You could be lucky and have such a case, who knows.. I spend some time each day supporting people using postgresql in the #postgresql irc channel (on the freenode.net network). There I talk to people doing both small and big conversions and the majority is very happy with the postgresql performance. Postgresql have gotten faster and faster with each release and while speed was a fair argument a number of years ago it's not like that today. That said, in the end it depends on the application. We are all interested in how it goes (well, at least me :-), so feel free to send more mails keeping us posted. Good luck. -- /Dennis Björklund ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Why those queries do not utilize indexes?
On Fri, 27 Aug 2004, Artimenko Igor wrote: 1. Sequential search and very high cost if set enable_seqscan to on; Seq scan on messageinfo ( cost=0.00..24371.30, rows =36802 ) 2. Index scan but even bigger cost if set enable_seqscan to off; Index messagesStatus on messageinfo ( Cost=0.00..27220.72, rows=36802 ) So pg thinks that a sequential scan will be a little bit faster (The cost is a little bit smaller). If you compare the actual runtimes maybe you will see that pg was right. In this case the cost is almost the same so the runtime is probably almost the same. When you have more data pg will start to use the index since then it will be faster to use an index compared to a seq. scan. -- /Dennis Björklund ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Optimizer Selecting Incorrect Index
On Wed, 25 Aug 2004, Richard Huxton wrote: Index Scan using trptserc on trans (cost=0.00..465.10 rows=44 width=118) Index Cond: (trn_patno = 19) Filter: ((trn_old_date = '1994-08-23'::date) AND (trn_old_date = '2004-08-23'::date) AND (trn_bill_inc = 'B'::bpchar)) (687 rows) Index Scan using todate on trans (cost=0.00..105165.74 rows=1 width=143) Index Cond: ((trn_old_date = '1994-08-23'::date) AND (trn_old_date = '2004-08-23'::date)) Filter: ((trn_patno = 19) AND (trn_bill_inc = 'B'::bpchar)) (713 rows) These queries are different. The first returns 687 rows and the second 713 rows. The 687 and 713 are the number of rows in the plan, not the number of rows the queries return. -- /Dennis Björklund ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Performance Bottleneck
On Tue, 3 Aug 2004, Martin Foster wrote: to roughly 175 or more. Essentially, the machine seems to struggle to keep up with continual requests and slows down respectively as resources are tied down. I suggest you try to find queries that are slow and check to see if the plans are optimal for those queries. There are some logging options for logging quries that run longer then a user set limit. That can help finding the slow queries. Just doing some logging for some typical page fetches often show things that can be done better. For example, it's not uncommon to see the same information beeing pulled several times by misstake. Maybe you can also try something like connection pooling. I'm not sure how much that can give, but for small queries the connection time is usually the big part. Would disabling 'fsync' provide more performance if I choose that information may be lost in case of a crash? I would not do that. In most cases the performance increase is modest and the data corruption risk after a crash is much bigger so it's not worth it. If you have a lot of small inserts then it might be faster with this, but if possible it's much better to try to do more work in a transaction then before. -- /Dennis Björklund ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Query gets slow when where clause increases
On Tue, 29 Jun 2004, James Antill wrote: - Index Scan using idx_ticket_groups_assigned on ticket_groups g (cost=0.00..241.76 rows=5 width=20) (actual time=0.13..12.67 rows=604 loops=1) Index Cond: (assigned_to = 1540) Here the planner estimated that it would find 5 rows, but it did find 604. I take that as a sign that you have not ran VACUUM ANALYZE recently? If you done that, then maybe you need to change the statistics target for that column. Before you set it on that column you could try to just alter the default statistics target for one session like this: SET default_statistics_target TO 100; ANALYZE; and then see if you get a better plan when you run the query afterwards. If it helps you can either set the default_statistics_target in postgresql.conf or set it just for some column using ALTER TABLE. -- /Dennis Björklund ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] no index-usage on aggregate-functions?
On Tue, 29 Jun 2004, Harald Lau (Sector-X) wrote: Average and sum can never use an index AFAIK, in any db server. You need information from every row. Take a look at the SQLSrv-pendant: create index x_1 on the_table (num_found) select avg(num_found) from the_table - Index Scan(OBJECT:([midata].[dbo].[THE_TABLE].[x_1]) But is it really faster is the question? This sum needs all the values in that column. As far as I know it uses the index because it uses less space on disk and thus is a little faster due to less IO. In pg the index doesn't work like that, so in pg it's faster to sum all values using the table itself. If you have a WHERE clause to only sum some values, then pg will use an index (if applicable) and you will see a speedup. For min and max the situation is different, there an index can give you the answer without scanning all rows. For that the workaround exist in pg. The pg aggregate functions are very general and no one have special cased min/max yet. Until that happen the work around works and is fast. So, it seems that PG is not s well suited for a datawarehouse and/or performing extensive statistics/calculations/reportings on large tables, is it? I don't see how you can say that from your example. Just because it uses an index for the sum above does not mean that it is a lot faster. It still have to do as many additions as pg has to do. Sure, mvcc is best when you have both read and writes. But it should still be comparable in speed even if you only do reads. -- /Dennis Björklund ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Quad processor options
On Tue, 11 May 2004, Bjoern Metzdorf wrote: I am curious if there are any real life production quad processor setups running postgresql out there. Since postgresql lacks a proper replication/cluster solution, we have to buy a bigger machine. Du you run the latest version of PG? I've read the thread bug have not seen any information about what pg version. All I've seen was a reference to debian which might just as well mean that you run pg 7.2 (probably not but I have to ask). Some classes of queries run much faster in pg 7.4 then in older versions so if you are lucky that can help. -- /Dennis Björklund ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] planner/optimizer question
On Fri, 30 Apr 2004, Gary Doades wrote: I should have also pointed out that MSSQL reported that same index scan as taking 65% of the overall query time. It was just faster. The overall query took 103ms in MSSQL. Are your results based on a single client accessing the database and no concurrent updates? Would adding more clients, and maybe having some client that updates/inserts into the tables, still make mssql faster then pg? Maybe it's so simple as pg being optimized for more concurrent users then mssql? I'm just asking, I don't know much about the inner workings of mssql. -- /Dennis Björklund ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Horribly slow hash join
On Sat, 17 Apr 2004, Tom Lane wrote: *some* set of inputs. (Also, I have been harboring some notions of supporting cross-type hash joins for integer types, which will not work unless small int8 values hash the same as int4 etc.) The simple solution would be to always extend integers to 64 bits (or whatever the biggest integer is) before calculating the hash. It makes the hash function a little slower for smaller types, but it's mostly an operation in the cpu and no memory involved, so it's probably not noticable. -- /Dennis Björklund ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Horribly slow hash join
On Sun, 18 Apr 2004, Tom Lane wrote: That creates portability issues though. We do not depend on there being a 64-bit-int type for anything except int8 itself, and I don't want to start doing so. What do you mean? int8 is supported on all platformas and if the hasfunction would convert all numbers to int8 before making the hash it would work. I don't see any portability problems. -- /Dennis Björklund ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Horribly slow hash join
On Sun, 18 Apr 2004, Tom Lane wrote: What do you mean? int8 is supported on all platformas No it isn't. So on platforms where it isn't you would use int4 as the biggest int then. I don't really see that as a problem. As long as you calculate the hash on the biggest int on that platform it should work. -- /Dennis Björklund ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Horribly slow hash join
On Sun, 18 Apr 2004, Bruno Wolff III wrote: Another option would be to put the numbers into two int4s. For int4 or smaller types one of these would be zero. int8s would be split between the two. The hash function would then be defined on the two int4s. Sure, this is an internal calculation in the hash function. The only important thing is that the number 7 (for example) gives the same hash value no matter if it is an int2 or an int8 and that the hash function works well also for int8 numbers (which is does not today). At least that was the properties I understood that we wanted. We got side tracked into talking about what datatype exists in all platforms, that's not an issue at all. -- /Dennis Björklund ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] select count(*) on large tables
On Thu, 8 Apr 2004, Cosimo Streppone wrote: The alternative solution I tried, that has an optimal speed up, unfortunately is not a way out, and it is based on EXPLAIN SELECT count(*) output parsing, which is obviously *not* reliable. Try this to get the estimate: SELECT relname, reltuples from pg_class order by relname; The times always get better doing a vacuum (and eventually reindex) of the table, and they slowly lower again. Yes, the estimate is updated by the analyze. Is there an estimate time for this issue to be resolved? It's not so easy to fix. The naive fixes makes other operations slower, most notably makes things less concurrent which is bad since it wont scale as good for many users then. You can always keep the count yourself and have some triggers that update the count on each insert and delete on the table. It will of course make all inserts and deletes slower, but if you count all rows often maybe it's worth it. Most people do not need to count all rows in a table anyway. You usually count all rows such as this and that (some condition). -- /Dennis Björklund ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] good pc but bad performance,why?
On Wed, 7 Apr 2004, huang yaqin wrote: You said turning fsync off may cause losing data, that's terrible. I use SCSI disk, and file system is ext3. I tune postgresql.conf and can't get any improvement. So what can I do? Make sure you do as much as possible inside one transaction. If you want to do 1000 inserts, then do BEGIN; insert ; insert; ... ; COMMIT; -- /Dennis Björklund ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] simple query join
On Mon, 8 Mar 2004, Chris Smith wrote: assetid| integer | not null default 0 assetid| character varying(255) | not null default '0' The types above does not match, and these are the attributes you use to join. -- /Dennis Björklund ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Fixed width rows faster?
On Fri, 5 Mar 2004, Jeff wrote: Is there any performance advantage to using a fixed width row (with PG)? As far as I know there is only a small win when you want to extract some field from a tuple and with variable width fields you have to walk to the correct field. But this is a small performance problem unless you have very many variable size columns in the table. different databases and we have also some custom software here that uses fixed width rows to be able to hit row N in O(1) This can not happen in pg since there is no row N. Every transaction can have a different view of the table, some rows are visible and some others are not. To find row N you have to walk from the start and inspect every tuple to see if it's visible to this transaction or not. -- /Dennis Björklund ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] query optimization question
On Thu, 29 Jan 2004, Tom Lane wrote: jackdb-# GROUP BY memberid_ HAVING ( Um, that's not what I had in mind at all. Does GROUP BY actually do anything at all here? (You didn't answer me as to whether memberid_ is a unique identifier or not, but if it is, this GROUP BY is just an expensive no-op.) From your comment I assume that there is no transformation in pg that detects that the group by columns are unique? this is all blue-sky speculation anyway. What I'm actually more interested in is your statement that MSSQL can do the original query quickly. I find that a bit hard to believe because I don't see any relevant optimization techniques. Getting rid of the group by would not give that kind of speedup? Maybe mssql manage to rewrite the query like that before executing. -- /Dennis Björklund ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] query optimization question
On Thu, 29 Jan 2004, Jack Coates wrote: Probably better to repost it as a gzip'd attachment. That should complete with a picture of the GUI version. 26k zipped, let's see if this makes it through. Are you sure you attached it? At least when it got here there was no attachment. -- /Dennis Björklund ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Really slow even after vacuum
On Wed, 21 Jan 2004, Jeroen Baekelandt wrote: jms_messages again. It takes 80 seconds!?! While before, with 1000 records, it took only a fraction of a second. run: VACUUM FULL ANALYZE; -- /Dennis Björklund ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Explain not accurate
On Fri, 9 Jan 2004, Richard van den Berg wrote: problems. However, I've run into an issue where explain tells us a the costs of a quiry are tremendous (105849017586), but the query actually runs quite fast. Even explain analyze shows these costs. It would be helpful if you can show the query and the EXPLAIN ANALYZE of the query (and not just EXPLAIN). This makes me wonder: can the estimates explain shows be dead wrong? Of course they can. An estimation is just an estimation. If you have not analyzed the database then it's most likely wrong. Dead wrong is not common, but not impossible. Run VACUUM ANALYZE and see if the estimate is better after that. -- /Dennis Björklund ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] annoying query/planner choice
On Sun, 11 Jan 2004, Andrew Rawnsley wrote: 20-25% of the time. Fiddling with CPU_TUPLE_COST doesn't do anything until I exceed 0.5, which strikes me as a bit high (though please correct me if I am assuming too much...). RANDOM_PAGE_COST seems to have no effect. What about the effective cache size, is that set properly? -- /Dennis Björklund ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Question about difference in performance of 2 queries
On Mon, 29 Dec 2003, Sean Shanny wrote: The first plan below has horrendous performance. we only get about 2% CPU usage and iostat shows 3-5 MB/sec IO. The second plan runs at 30% cpu and 15-30MB.sec IO. Could someone shed some light on why the huge difference in performance? Both are doing index scans plus a filter. We have no content_keys below -1 at this time so the queries return the same results. EXPLAIN ANALYZE gives more information then EXPLAIN, and is prefered. It uses different indexes in the two queries, and one seems to be faster then the other. Why, I can't tell yet. I would assume that you would get the fastet result if you had an index (content_key, date_key) I don't know if pg will even use an index to speed up a operation. When you had then it could use the idx_pageviews_content index. Why it choose that when the other would be faster I don't know. Maybe explain analyze will give some hint. -- /Dennis ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Why is restored database faster?
On Thu, 18 Dec 2003, Shridhar Daithankar wrote: Well, then the only issue left is file sytem defragmentation. And the internal fragmentation that can be fixed with the CLUSTER command. -- /Dennis ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Why is restored database faster?
On Tue, 16 Dec 2003, David Shadovitz wrote: I backed up my database using pg_dump, and then restored it onto a different server using psql. I see that the query SELECT COUNT(*) FROM myTable executes immediately on the new server but takes several seconds on the old one. (The servers are identical.) What could account for this difference? Clustering? How can I get the original server to perform as well as the new one? You probably need to run VACUUM FULL. It locks the tables during its execution so only do it when the database is not in full use. If this helps you probably need to do normal vacuums more often and maybe tune the max_fsm_pages to be bigger. -- /Dennis ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] a lot of problems with pg 7.4
On Sat, 13 Dec 2003, Kari Lavikka wrote: I evaluated pg 7.4 on our development server and it looked just fine but performance with production loads seems to be quite poor. Most of performance problems are caused by nonsensical query plans Some of the estimates that pg made in the plans you showed was way off. I assume you have run VACUUM ANALYZE recently? If that does not help maybe you need to increaste the statistics gathering on some columns so that pg makes better estimates. With the wrong statistics it's not strange that pg chooses bad plans. -- /Dennis ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Optimize
On Mon, 24 Nov 2003, Torsten Schulz wrote: sort_mem = 32768 # min 32 32 meg per sort can be a lot in total if you have many clients sorting things. I assume you have checked so that the computer is not pushed into swapping when you have the peak with lots of users. A swapping computer is never fast. Using some swap space is not bad, but a lot of page in and page out to the swap is not good. -- /Dennis ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Help with count(*)
On Fri, 14 Nov 2003, Tom Lane wrote: I believe the previous discussion also agreed that we wanted to postpone the freezing of now(), which currently also happens at BEGIN rather than the first command after BEGIN. Or should that happen at the first call to now()? /me should ge back and try to find this previous discussion. -- /Dennis ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] *very* slow query to summarize data for a month ...
On Mon, 10 Nov 2003, Marc G. Fournier wrote: explain analyze SELECT ts.company_id, company_name, SUM(ts.bytes) AS total_traffic FROM company c, traffic_logs ts WHERE c.company_id = ts.company_id AND month_trunc(ts.runtime) = '2003-10-01' GROUP BY company_name,ts.company_id; What if you do ts.runtime = '2003-10-01' AND ts.runtime '2003-11-01' and add an index like (runtime, company_name, company_id)? -- /Dennis ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] One or more processor ?
On Fri, 10 Oct 2003, [iso-8859-15] Hervé Piedvache wrote: A simple question about PostgreSQL ... I have a Pentium Xeon Quadri processors ... If I do a SQL request ... does PostgreSQL use one or more processor ? Each connection becomes a process, and each process runs on one processor. So, with only one connection you use only one processor (and the OS might use an other processor). Most databases has many concurrent users and then it will use more processors. -- /Dennis ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] OFFTOPIC: PostgreSQL vs MySQL
On Thu, 9 Oct 2003, David Griffiths wrote: PostgreSQL supports constraints. MySQL doesn't; programmers need to take care of that from the client side Again, InnoDB supports constraints. Really? This is news. We did some tests on constraints on InnoDB, and found that while they parsed, they were not actually enforced.Was our test in error? You may have turned them off to load data? I've run into constraints when my data-load script missed some rows in address_type. When it went to do the address_list table, all rows that had the missing address_type failed, as they should. I saw no weakness in the constraints. It sounds like you talk about foreign keys only, while the previous writer talkes about other constraints also. For example, in postgresql you can do: CREATE TABLE foo ( x int, CONSTRAINT bar CHECK (x 5) ); and then # INSERT INTO foo VALUES (4); ERROR: ExecInsert: rejected due to CHECK constraint bar on foo I don't know MySQL, but I've got the impression from other posts on the lists that innodb supports foreign keys only. I might be wrong though. -- /Dennis ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] restore time: sort_mem vs. checkpoing_segments
On Tue, 23 Sep 2003, Bruce Momjian wrote: With the new warning about too-frequent checkpoints, people have actual feedback to encourage them to increase checkpoint_segments. One issue is that it is likely to recommend increasing checkpoint_segments during restore, even if there is no value to it being large during normal server operation. Should that be decumented? One could have a variable that turns off that warning, and have pg_dump insert a statement to turn it off. That is, if one never want these warnings from a restore (from a new dump). In any case, documentation is good and still needed. -- /Dennis ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Reading data in bulk - help?
On Wed, 10 Sep 2003, Chris Huston wrote: Sometime before then I need to figure out how to cut the fetch times from the now 200ms to something like 10ms. You didn't say anything about Joshs first point of adjusting postgresql.conf to match your machine. Settings like effective_cache_size you almost always want to increase from the default setting, also shared memory. -- /Dennis ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings