Re: [PERFORM] 8.1 -> 8.4 regression
Can you force 8.4 to generate the same plan as 8.1? For example by running SET enable_hashjoin = off; before you run EXPLAIN on the query? If so, then we can compare the numbers from the forced plan with the old plan and maybe figure out why it didn't use the same old plan in 8.4 as it did in 8.1. Note that the solution is not to force the plan, but it can give us more information. /Dennis > is at least one query which has degraded in performance quite a bit. Here > is the plan on 8.4.2: > http://wood.silentmedia.com/bench/842 > > Here is the very much less compact plan for the same query on 8.1.19: > http://wood.silentmedia.com/bench/8119 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select on 22 GB table causes "An I/O error occured while sending to the backend." exception
On Thu, Aug 28, 2008 at 8:11 PM, Scott Marlowe <[EMAIL PROTECTED]>wrote: > > wait a min here, postgres is supposed to be able to survive a complete > box > > failure without corrupting the database, if killing a process can corrupt > > the database it sounds like a major problem. > > Yes it is a major problem, but not with postgresql. It's a major > problem with the linux OOM killer killing processes that should not be > killed. > > Would it be postgresql's fault if it corrupted data because my machine > had bad memory? Or a bad hard drive? This is the same kind of > failure. The postmaster should never be killed. It's the one thing > holding it all together. > I fail to see the difference between the OOM killing it and the power going out. And yes, if the power went out and PG came up with a corrupted DB (assuming I didn't turn off fsync, etc) I *would* blame PG. I understand that killing the postmaster could stop all useful PG work, that it could cause it to stop responding to clients, that it could even "crash" PG, et ceterabut if a particular process dying causes corrupted DBs, that sounds borked to me.
Re: [PERFORM] how does pg handle concurrent queries and same queries
Slightly off-topic, but judging from the fact that you were able to "fix" the query, it seems you have some way to modify the application code itself. In that case, I'd try to implement caching (at least for this statement) on the application side, for example with memcached. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] index scan cost
The "fast" server makes a much more accurate estimation of the number of rows to expect (4 rows are returning, 1 was estimated). The "slow" server estimates 1151 rows. Try running ANALYZE on the slow one -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Possible Redundancy/Performance Solution
1) RAID1+0 to make one big volume 2) RAID1 for OS/apps/etc, RAID1 for database 3) RAID1 for OS+xlog, RAID1 for database 4) RAID1 for OS+popular tables, RAID1 for rest of database Lots of good info, thanks for all the replies. It seems to me then, that the speed increase you'd get from raid0 is not worth the downtime risk, even when you have multiple servers. I'll start pricing things out and see what options we have. Thanks again, Dennis -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Possible Redundancy/Performance Solution
Greg Smith wrote: On Tue, 6 May 2008, Dennis Muhlestein wrote: > Since disks are by far the most likely thing to fail, I think it would be bad planning to switch to a design that doubles the chance of a disk failure taking out the server just because you're adding some server-level redundancy. Anybody who's been in this business for a while will tell you that seemingly improbable double failures happen, and if were you'd I want a plan that survived a) a single disk failure on the primary and b) a single disk failure on the secondary at the same time. Let me strengthen that--I don't feel comfortable unless I'm able to survive a single disk failure on the primary and complete loss of the secondary (say by power supply failure), because a double failure that starts that way is a lot more likely than you might think. Especially with how awful hard drives are nowadays. Those are good points. So you'd go ahead and add the pgpool in front (or another redundancy approach, but then use raid1,5 or perhaps 10 on each server? -Dennis -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Possible Redundancy/Performance Solution
Greg Smith wrote: On Tue, 6 May 2008, Dennis Muhlestein wrote: RAID0 on two disks makes a disk failure that will wipe out the database twice as likely. If you goal is better reliability, you want some sort of RAID1, which you can do with two disks. That should increase read throughput a bit (not quite double though) while keeping write throughput about the same. I was planning on pgpool being the cushion between the raid0 failure probability and my need for redundancy. This way, I get protection against not only disks, but cpu, memory, network cards,motherboards etc. Is this not a reasonable approach? If you added four disks, then you could do a RAID1+0 combination which should substantially outperform your existing setup in every respect while also being more resiliant to drive failure. Our applications are mostly read intensive. I don't think that having two databases on one machine, where previously we had just one, would add too much of an impact, especially if we use the load balance feature of pgpool as well as the redundancy feature. A lot depends on how much RAM you've got and whether it's enough to keep the cache hit rate fairly high here. A reasonable thing to consider here is doing a round of standard performance tuning on the servers to make sure they're operating efficient before increasing their load. Can anyone comment on any gotchas or issues we might encounter? Getting writes to replicate to multiple instances of the database usefully is where all the really nasty gotchas are in this area. Starting with that part and working your way back toward the front-end pooling from there should crash you into the hard parts early in the process. Thanks for the tips! Dennis -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Possible Redundancy/Performance Solution
Right now, we have a few servers that host our databases. None of them are redundant. Each hosts databases for one or more applications. Things work reasonably well but I'm worried about the availability of some of the sites. Our hardware is 3-4 years old at this point and I'm not naive to the possibility of drives, memory, motherboards or whatever failing. I'm toying with the idea of adding a little redundancy and maybe some performance to our setup. First, I'd replace are sata hard drives with a scsi controller and two scsi hard drives that run raid 0 (probably running the OS and logs on the original sata drive). Then I'd run the previous two databases on one cluster of two servers with pgpool in front (using the redundancy feature of pgpool). Our applications are mostly read intensive. I don't think that having two databases on one machine, where previously we had just one, would add too much of an impact, especially if we use the load balance feature of pgpool as well as the redundancy feature. Can anyone comment on any gotchas or issues we might encounter? Do you think this strategy has possibility to accomplish what I'm originally setting out to do? TIA -Dennis -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Where do a novice do to make it run faster?
A B wrote: So, it is time to improve performance, it is running to slow. AFAIK (as a novice) there are a few general areas: 1) hardware 2) rewriting my queries and table structures 3) using more predefined queries 4) tweek parameters in the db conf files Of these points: 1) is nothing I can do about right now, but in the future perhaps. 2) will be quite hard right now since there is more code than time. 3) almost like 2 but perhaps more do-able with the current constraints. 4) This seems to be the easiest one to start with... So what should I do/read concerning point 4? If you have other good suggestions I'd be very interested in that. Thank you :-) 1st, change your log settings log_min_duration_statement to something like 1000 (one second). This will allow you to see which statements take the longest. 2nd. Use EXPLAIN ANALYZE on those statements to determine what is taking a long time and focus on optimizing those statements that take the longest to execute. That ought to get you a long way down the road. -Dennis -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Turn correlated in subquery into join
Look like the mysql people found a subquery that postgresql doesn't handle as good as possible: http://s.petrunia.net/blog/ Is there some deeper issue here that I fail to see or is it simply that it hasn't been implemented but is fairly straigt forward? In the link above they do state that it isn't a very common case anyway. /Dennis - Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow Postgresql server
Jason Lustig skrev: and work_mem to 8096. What would cause the computer to only use such a small percentage of the CPU, with more than half of it waiting on I/O requests? Do your webpages write things to the database on each connect? Maybe it do a bunch of writes each individually commited? For every commit pg will wait for the data to be written down to the disk platter before it move on. So if you do several writes you want to do them in one transaction so you only need one commit. /Dennis ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Upgraded to 8.2.3 --- still having performance issues
Carlos Moreno skrev: The system does very frequent insertions and updates --- the longest table has, perhaps, some 20 million rows, and it's indexed (the primary key is the combination of two integer fields). This longest table only has inserts (and much less frequent selects), at a peak rate of maybe one or a few insertions per second. One or a few inserts per second doesn't sound like that much. I would have expected it to work. If you can you might want to group several inserts into a single transaction. A standard hint is also to move the WAL onto its own disk. Or get a disk controller with battery backed up ram. But it's hard to say from your description what the bottleneck is and thus hard to give any advice. Any help/tips/guidance in troubleshooting this issue? It will be much appreciated! You could try to find specific queries that are slow. Pg can for example log queries for you that run for longer than X seconds. /Dennis ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Seqscan/Indexscan still a known issue?
Carlos Moreno skrev: When I force it via "set enable_seqscan to off", the index scan takes about 0.1 msec (as reported by explain analyze), whereas > For the time being, I'm using an explicit "enable_seqscan off" in the client code, before executing the select. But I wonder: Is this still an issue, or has it been solved in the latest version? For most queries it has never been an issue. Every once in a while there is a query that the planner makes a non-optimal plan for, but it's not that common. In general the optimizer has improved with every new version of pg. Almost everyone I've talked to that has upgraded has got a faster database tham before. It was like that for 7.4->8.0, for 8.0->8.1 and for 8.1->8.2. So in your case going from 7.4->8.2 is most likely going to give a speedup (especially if you have some queries that isn't just simple primary key lookups). In your case it's hard to give any advice since you didn't share the EXPLAIN ANALYZE output with us. I'm pretty sure it's possible to tune pg so it makes the right choice even for this query of yours but without the EXPLAIN ANALYZE output we would just be guessing anyway. If you want to share it then it might be helpful to show the plan both with and without seqscan enabled. How often do you run VACUUM ANALYZE; on the database? /Dennis ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Worse perfomance on 8.2.0 than on 7.4.14
Rolf Østvik skrev: I have a simple query which uses 32ms on 7.4.14 and 1015ms on 8.2.0. I guess 7.4.14 creates a better execution plan than 8.2.0 for this query but Try to turn off planner options in 8.2 to make it generate the same plan as 7.4. Then run EXPLAIN ANALYZE on that query that generate the same plan as in 7.4 and we can compare the costs and maybe understand what go wrong. For example, try set enable_hashjoin to false; set enable_bitmapscan to false; but you might need to turn off more things to get it to generate the 7.4 plan. /Dennis ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] How import big amounts of data?
On Thu, 29 Dec 2005, Arnau wrote: >Which is the best way to import data to tables? I have to import > 9 rows into a column and doing it as inserts takes ages. Would be > faster with copy? is there any other alternative to insert/copy? Wrap the inserts inside a BEGIN/COMMIT block and it will be a lot faster. Copy is even faster, but for just 9 rows I wouldn't bother. -- /Dennis Björklund ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] 8.1 iss
On Sun, 6 Nov 2005, PostgreSQL wrote: > SELECT v_barcode, count(v_barcode) FROM lead GROUP BY v_barcode HAVING > count(*) > 1; > > This is a dual Opteron box with 16 Gb memory and a 3ware SATA raid > runing 64bit SUSE. Something seems badly wrong. > > GroupAggregate (cost=9899282.83..10285434.26 rows=223858 width=15) > Filter: (count(*) > 1) > -> Sort (cost=9899282.83..9994841.31 rows=38223392 width=15) > Sort Key: v_barcode > -> Seq Scan on lead (cost=0.00..1950947.92 rows=38223392 width=15) What do the plan look like in 8.0? Since it's so much faster I assume you get a different plan. > shared_buffers = 5 > work_mem = 16384 > maintenance_work_mem = 16384 > max_fsm_pages = 10 > max_fsm_relations = 5000 > wal_buffers = 32 > checkpoint_segments = 32 > effective_cache_size = 5 > default_statistics_target = 50 The effective_cache_size is way too low, only 390M and you have a machine with 16G. Try bumping it to 100 (which means almost 8G, how nice it would be to be able to write 8G instead...). It could be set even higher but it's hard for me to know what else your memory is used for. I don't know if this setting will affect this very query, but it should have a positive effect on a lot of queries. work_mem also seems low, but it's hard to suggest a good value on it without knowing more about how your database is used. -- /Dennis Björklund ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Need help in setting optimal configuration for a huge
On Sun, 23 Oct 2005, Kishore B wrote: > We need to insert into the bigger table almost for every second , through > out the life time. In addition, we receive at least 200,000 records a day at > a fixed time. > > We are facing a* critical situation because of the performance of the ** > database**.* Even a basic query like select count(*) from bigger_table is > taking about 4 minutes to return. Count(*) like that always scans the full table, but 4 minutes still sound like a lot. How often do you vacuum? Could it be that the disk is full of garbage due to not enough vacuum? A query like this can help find bloat: SELECT oid::regclass, reltuples, relpages FROM pg_class ORDER BY 3 DESC; I assume to do updates and deletes as well, and not just inserts? -- /Dennis Björklund ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Comparative performance
On Wed, 28 Sep 2005, Joe wrote: > Before I post the EXPLAIN and the table schema I'd appreciate > confirmation that this list is the appropriate forum. It is and and useful things to show are * the slow query * EXPLAIN ANALYZE of the query * the output of \d for each table involved in the query * the output of SHOW ALL; * The amount of memory the machine have The settings that are the most important to tune in postgresql.conf for performance is in my opinion; shared_buffers, effective_cache_size and (to a lesser extent) work_mem. -- /Dennis Björklund ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] How to improve Postgres performance
On Tue, 30 Aug 2005, Hemant Pandey wrote: > So please tell me how can i improve database performance through > configuration parameters. I had tried to change parameters in > postgresql.conf file but of no avail. > Now i am trying to Auto Vacuum, but don't know how to run Auto Vacuum. The most important part is that you need to run VACUUM ANALYZE regulary. Vacuum can be started each night in a cron job, started from pg_autovacuum when it thinks it's needed, or started in some other way. In any case, it has to be run whenever the data in the database have changed enough. The parameters in the config that is most important in my experience is effective_cache_size and shared_buffers. This is a text I like (it's for pg 7.4 but still useful): http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html -- /Dennis Björklund ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Need for speed
On Tue, 16 Aug 2005, Ulrich Wisser wrote: > Still when several users are on line the reporting gets very slow. > Queries can take more then 2 min. Could you show an exampleof such a query and the output of EXPLAIN ANALYZE on that query (preferably done when the database is slow). It's hard to say what is wrong without more information. -- /Dennis Björklund ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0
On Wed, 27 Jul 2005, Matthew Schumacher wrote: > Then they do this to insert the token: > > INSERT INTO bayes_token ( > id, > token, > spam_count, > ham_count, > atime > ) VALUES ( > ?, > ?, > ?, > ?, > ? > ) ON DUPLICATE KEY > UPDATE > spam_count = GREATEST(spam_count + ?, 0), > ham_count = GREATEST(ham_count + ?, 0), > atime = GREATEST(atime, ?) > > Or update the token: > > UPDATE bayes_vars SET > $token_count_update > newest_token_age = GREATEST(newest_token_age, ?), > oldest_token_age = LEAST(oldest_token_age, ?) > WHERE id = ? > > > I think the reason why the procedure was written for postgres was > because of the greatest and least statements performing poorly. How can they perform poorly when they are dead simple? Here are 2 functions that work for the above cases of greatest: CREATE FUNCTION greatest_int (integer, integer) RETURNS INTEGER IMMUTABLE STRICT AS 'SELECT CASE WHEN $1 < $2 THEN $2 ELSE $1 END;' LANGUAGE SQL; CREATE FUNCTION least_int (integer, integer) RETURNS INTEGER IMMUTABLE STRICT AS 'SELECT CASE WHEN $1 < $2 THEN $1 ELSE $2 END;' LANGUAGE SQL; and these should be inlined by pg and very fast to execute. I wrote a function that should do what the insert above does. The update I've not looked at (I don't know what $token_count_update is) but the update looks simple enough to just implement the same way in pg as in mysql. For the insert or replace case you can probably use this function: CREATE FUNCTION insert_or_update_token (xid INTEGER, xtoken BYTEA, xspam_count INTEGER, xham_count INTEGER, xatime INTEGER) RETURNS VOID AS $$ BEGIN LOOP UPDATE bayes_token SET spam_count = greatest_int (spam_count + xspam_count, 0), ham_count = greatest_int (ham_count + xham_count, 0), atime = greatest_int (atime, xatime) WHERE id = xid AND token = xtoken; IF found THEN RETURN; END IF; BEGIN INSERT INTO bayes_token VALUES (xid, xtoken, xspam_count, xham_count, xatime); RETURN; EXCEPTION WHEN unique_violation THEN -- do nothing END; END LOOP; END; $$ LANGUAGE plpgsql; It's not really tested so I can't tell if it's faster then what you have. What it does do is mimic the way you insert values in mysql. It only work on pg 8.0 and later however since the exception handling was added in 8.0. -- /Dennis Björklund ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] performance problems ... 100 cpu utilization
David Mitchell wrote: If your table has got into this state, then vacuum analyze won't fix it. You will have to do a vacuum full to get it back to normal, then regularly vacuum (not full) to keep it in good condition. We vacuum our critical tables every 10 minutes to keep them in good nick. So should I have vacuum run during the load test? At what level of updates should it run every ten minutes? Dennis ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] performance problems ... 100 cpu utilization
David Mitchell wrote: What is the load average on this machine? Do you do many updates? If you do a lot of updates, perhaps you haven't vacuumed recently. We were seeing similar symptoms when we started load testing our stuff and it turned out we were vacuuming too infrequently. The load average at the 100% utilization point was about 30! A vacuum analyze was done before the test was started. I believe there are many more selects than updates happening at any one time. Dennis ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] performance problems ... 100 cpu utilization
Qingqing Zhou wrote: Are you sure 100% CPU usage is solely contributed by Postgresql? Also, from the ps status you list, I can hardly see that's a problem because of problem you mentioned below. The postgreSQL processes are what is taking up all the cpu. There aren't any other major applications on the machine. Its a dedicated database server, only for this application. It doesn't seem to make sense that PostgreSQL would be maxed out at this point. I think given the size of the box, it could do quite a bit better. So, what is going on? I don't know. Dennis ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] performance problems ... 100 cpu utilization
Hi, I've got a java based web application that uses PostgreSQL 8.0.2. PostgreSQL runs on its own machine with RHEL 3, ia32e kernel, dual Xeon processor, 4 Gb ram. The web application runs on a seperate machine from the database. The application machine has three tomcat instances configured to use 64 database connections each using DBCP for pooling. Most of the data access is via Hibernate. The database itself is about 100 meg in size. We're perf testing the application with Loadrunner. At about 500 virtual users hitting the web application, the cpu utilization on the database server is at 100%, PostgreSQL is on its knees. The memory usage isn't bad, the I/O isn't bad, only the CPU seems to be maxed out. checking the status of connections at this point ( ps -eaf | grep "postgres:") where the CPU is maxed out I saw this: 127 idle 12 bind 38 parse 34 select Hibernate is used in the application and unfortunately this seems to cause queries not to get logged. (see http://archives.postgresql.org/pgsql-admin/2005-05/msg00241.php) I know there has been discussion about problems on Xeon MP systems. Is this what we are running into? Or is something else going on? Is there other information I can provide that might help determine what is going on? Here are the postgresql.conf settings: # The maximum number of connections. max_connections = 256 # Standard performance-related settings. shared_buffers = 16384 max_fsm_pages = 20 max_fsm_relations = 1 fsync = false wal_sync_method = fsync wal_buffers = 32 checkpoint_segments = 6 effective_cache_size = 38400 random_page_cost = 2 work_mem = 16384 maintenance_work_mem = 16384 # TODO - need to investigate these. commit_delay = 0 commit_siblings = 5 max_locks_per_transaction = 512 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] General DB Tuning
Tom Arthurs wrote: we are using jdbc -- the "log_min_duration_statement = 3000 " statement works fine for me. Looks like there's no other work around for the bug(?). Not sure since I have no interest in logging a million statements a day, I only want to see the poorly performing hits. Doesn't it depend on what jdbc driver you are using? Dennis ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] planner picking more expensive plan
On Fri, 1 Jul 2005, Sam Mason wrote: The key thing with the query that Sam have is that if you turn off seqscan you get the first plan that run in 0.4ms and if seqscan is on the runtime is 27851ms. There are 100 way to make it select the seq scan, including rewriting the query to something more useful, tweaking different parameters and so on. The interesting part is that pg give the fast plan a cost of 202 and the slow a cost of 566141, but still it chooses the slow query unless seqscan is turned off (or some other tweak with the same effect). It know very well that the plan with the index scan will be much faster, it just don't manage to generate it unless you force it to. It makes you wonder if pg throws away some plans too early in the planning phase. > Limit (cost=0.00..202.52 rows=10 width=8) (actual time=0.221..0.600 rows=10 > loops=1) >-> Merge Left Join (cost=0.00..6628.30 rows=3302780 width=8) (actual > time=0.211..0.576 rows=10 loops=1) > Merge Cond: ("outer".animalid = "inner".animalid) > -> Index Scan using animals_pkey on animals a > (cost=0.00..10198983.91 rows=3302780 width=8) (actual time=0.112..0.276 > rows=10 loops=1) > -> Index Scan using movement_animal on movements m > (cost=0.00..56642740.73 rows=3107737 width=8) (actual time=0.088..0.235 > rows=10 loops=1) >Filter: (mtypeid = 0) > Total runtime: 0.413 ms > > Limit (cost=565969.42..566141.09 rows=10 width=8) (actual > time=27769.047..27769.246 rows=10 loops=1) >-> Merge Right Join (cost=565969.42..57264070.77 rows=3302780 width=8) > (actual time=27769.043..27769.228 rows=10 loops=1) > Merge Cond: ("outer".animalid = "inner".animalid) > -> Index Scan using movement_animal on movements m > (cost=0.00..56642740.73 rows=3107737 width=8) (actual time=0.022..0.154 > rows=10 loops=1) >Filter: (mtypeid = 0) > -> Sort (cost=565969.42..574226.37 rows=3302780 width=8) (actual > time=27768.991..27769.001 rows=10 loops=1) >Sort Key: a.animalid >-> Seq Scan on animals a (cost=0.00..77086.80 rows=3302780 > width=8) (actual time=0.039..5620.651 rows=3303418 loops=1) > Total runtime: 27851.097 ms Another thing to notice is that if one remove the Limit node then the situation is reversed and the plan that pg choose (with the Limit node) is the one with the lowest cost. The startup cost is however very high so combining that Merge Join with a Limit will of course produce something slow compared to the upper plan where the startup cost is 0.0. A stand alone test case would be nice, but even without the above plans are interesting. -- /Dennis Björklund ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] ported application having performance issues
On Thu, 30 Jun 2005, John Mendenhall wrote: > Our setting for effective_cache_size is 2048. > > random_page_cost = 4, effective_cache_size = 2048 time approximately 4500ms > random_page_cost = 3, effective_cache_size = 2048 time approximately 1050ms > random_page_cost = 3, effective_cache_size = 4096 time approximately 1025ms > > The decrease of random_page_cost to 3 caused the plan > to work properly, using the lead_requests table as a > join starting point and using the contacts index. The effective_cache_size still looks small. As a rule of tumb you might want effective_cache_size to be something like 1/2 or 2/3 of your total memory. I don't know how much you had, but effective_cache_size = 4096 is only 32M. shared_buffers and effective_cache_size is normally the two most important settings in my experience. -- /Dennis Björklund ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] ETL optimization
On Thu, 23 Jun 2005, Bricklen Anderson wrote: > iii. UNIQUE constraint on table "t1". This didn't seem to perform too > badly with fewer rows (preliminary tests), but as you'd expect, on error > the whole transaction would roll back. Is it possible to skip a row if > it causes an error, as opposed to aborting the transaction altogether? You don't need to roll back the whole transaction if you use savepoints or the exception features in pl/pgsql Take a look at this example: http://developer.postgresql.org/docs/postgres/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE -- /Dennis Björklund ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Needed: Simplified guide to optimal memory configuration
On Wed, 15 Jun 2005, Todd Landfried wrote: > So, what I need is to be pointed to (or told) what are the best > settings for our database given these memory configurations. What > should we do? Maybe this will help: http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html > NOTICE: shared_buffers is 256 This looks like it's way too low. Try something like 2048. -- /Dennis Björklund ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Query plan changes after pg_dump / pg_restore
On Thu, 9 Jun 2005, Jona wrote: > It's the same (physical) server as well as the same PostGreSQL daemon, > so yes. The only thing that can differ then is the statistics collected and the amount of dead space in tables and indexes (but since you both reindex and run vacuum full that should not be it). So comparing the statistics in the system tables is the only thing I can think of that might bring some light on the issue. Maybe someone else have some ideas. And as KL said, the effective_cache_size looked like it was way to small. With that setting bigger then pg should select index scans more often. It doesn't explain why the databases behave like they do now, but it might make pg select the same plan nevertheless. -- /Dennis Björklund ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Query plan changes after pg_dump / pg_restore
On Thu, 9 Jun 2005 [EMAIL PROTECTED] wrote: > I am continously encountering an issue with query plans that changes after > a pg_dump / pg_restore operation has been performed. > > Have placed an export of the query, query plan etc. online at: > http://213.173.234.215:8080/plan.htm in order to ensure it's still > readable. There is not a major difference in time, so pg is at least not way off (225ms vs. 280ms). The estimated cost is however not very related to the runtime (117 vs 1389). What you have not showed is if the database is properly tuned. The output of SHOW ALL; could help explain a lot together with info of how much memory your computer have. The first thing that comes to mind to me is that you probably have not tuned shared_buffers and effective_cache_size properly (SHOW ALL would tell). -- /Dennis Björklund ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] COPY vs INSERT
On Fri, 6 May 2005, Jim C. Nasby wrote: > Has thought been given to supporting inserting multiple rows in a single > insert? DB2 supported: > > INSERT INTO table VALUES( > (1,2,3), > (4,5,6), > (7,8,9) > ); > > I'm not sure how standard that is or if other databases support it. The sql standard include this, except that you can not have the outer (). So it should be INSERT INTO table VALUES (1,2,3), (4,5,6), (7,8,9); Do DB2 demand these extra ()? -- /Dennis Björklund ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] index scan on =, but not < ?
On Tue, 8 Mar 2005, Rick Schumeyer wrote: > =# explain select * from data where x = 0; > - > Index Scan using data_x_ix on data (cost=0.00..78.25 rows=19 width=34) >Index Cond: (x = 0::double precision) > > But this command, in which the only difference if > instead of =, is a > sequential scan. > > =# explain select * from data where x > 0; > -- > Seq Scan on data (cost=0.00..1722605.20 rows=62350411 width=34) >Filter: (x > 0::double precision) > > Why is this? That is because it's faster to execute the x>0 query with a seq. scan then a index scan. Postgresql is doing the right thing here. Pg estimates that the first query will return 19 rows and that the second query will return 62350411 rows. To return 62350411 rows it's faster to just scan the table and not use the index. -- /Dennis Björklund ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Performance Anomalies in 7.4.5
On Thu, 21 Oct 2004, Thomas F.O'Connell wrote: > Aggregate (cost=0.02..0.02 rows=1 width=8) (actual > time=298321.421..298321.422 rows=1 loops=1) > -> Nested Loop (cost=0.00..0.01 rows=1 width=8) (actual > time=1.771..298305.531 rows=2452 loops=1) > Join Filter: ("inner".id = "outer".id) > -> Seq Scan on userdata u (cost=0.00..0.00 rows=1 width=8) > (actual time=0.026..11.869 rows=2452 loops=1) > -> Seq Scan on userdata_history h (cost=0.00..0.00 rows=1 > width=8) (actual time=0.005..70.519 rows=41631 loops=2452) > Filter: (id = 18181::bigint) It looks like you have not run ANALYZE recently. Most people run VACUUM ANALYZE every night (or similar) in a cron job. -- /Dennis Björklund ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] First set of OSDL Shared Mem scalability results, some
On Fri, 8 Oct 2004, Josh Berkus wrote: > As you can see, the "sweet spot" appears to be between 5% and 10% of RAM, > which is if anything *lower* than recommendations for 7.4! What recommendation is that? To have shared buffers being about 10% of the ram sounds familiar to me. What was recommended for 7.4? In the past we used to say that the worst value is 50% since then the same things might be cached both by pg and the os disk cache. Why do we excpect the shared buffer size sweet spot to change because of the new arc stuff? And why would it make it better to have bigger shared mem? Wouldn't it be the opposit, that now we don't invalidate as much of the cache for vacuums and seq. scan so now we can do as good caching as before but with less shared buffers. That said, testing and getting some numbers of good sizes for shared mem is good. -- /Dennis Björklund ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] 7.4 vs 7.3 ( hash join issue )
On 22 Sep 2004, Greg Stark wrote: > Actually this looks like it's arguably a bug to me. Why does the hash > join execute the sequential scan at all? Shouldn't it also like the > merge join recognize that the other hashed relation is empty and skip > the sequential scan entirely? I'm not sure you can classify that as a bug. It's just that he in one of the plans started with the empty scan and bacause of that didn't need the other, but with the hash join it started with the table that had 16 rows and then got to the empty one. While I havn't checked, I assume that if it had started with the empty table there then it would have skipped the other. I don't know what criteria is used to select which part to start with when doing a hash join. Looks like it started with the one that had the highest estimate of rows here, doing it the other way around might be a good idea because you in some cases are lucky to find an empty scans and can omit the other. The above are just observations of the behaviour, I've not seen the source at all. -- /Dennis Björklund ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] 7.4 vs 7.3 ( hash join issue )
On Wed, 22 Sep 2004, Gaetano Mendola wrote: > Limit (cost=10.21..10.21 rows=1 width=24) (actual time=0.885..0.885 rows=0 > loops=1) > Limit (cost=10.28..10.28 rows=1 width=24) (actual time=0.429..0.429 rows=0 > loops=1) These estimated costs are almost the same, but the runtime differs a bit. This means that maybe you need to alter settings like random_page_cost, effective_cache and maybe some others to make the cost reflect the runtime better. Since the costs are so close to each other very small changes can make it choose the other plan. It's also very hard to make an estimate that is correct in all situations. That's why it's called an estimate after all. -- /Dennis Björklund ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] 7.4 vs 7.3 ( hash join issue )
On Wed, 22 Sep 2004, Gaetano Mendola wrote: > Now my question is why the 7.4 choose the hash join ? :-( It looks to me that the marge join is faster because there wasn't really anything to merge, it resulted in 0 rows. Maybe the hash join that is choosen in 7.4 would have been faster had there been a couple of result rows (just a guess). It would be interesting to compare the plans in 7.4 with and without hash_join active and see what costs it estimates for a merge join compared to a hash join. -- /Dennis Björklund ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres
On Sat, 11 Sep 2004, Mark Cotner wrote: > There are no performance concerns with MySQL, but it would benefit > greatly from stored procedures, views, etc. It is a very large rolling > data warehouse that inserts about 4.5 million rows every 2 hours and > subsequently rolls this data off the back end of a 90 day window. While it is impossible to know without testing, postgresql has the benefit of readers and writers that does not block each other. So in situations where you do lots of concurrent inserts and selects postgresql should behave well. > Merge table definition equivalent. We use these extensively. As far as I can tell a merge table in mysql is the same as a view over a number of unions of other tables. And possibly a rule that defines how inserts will be done if you do inserts in the merged table. > Merge table equivalent with all tables containing over 100M rows(and > about 40 columns, some quite wide) will need to do index scans in at > least 5 seconds(MySQL currently does 2, but we can live with 5) and > return ~200 rows. Since each table that are merged will have it's own index the speed should be proportional to the number of tables. Index scans in them self are very fast, and of you have 30 tables you need 30 index scans. Also, are you sure you really need merge tables? With pg having row locks and mvcc, maybe you could go for a simpler model with just one big table. Often you can also combine that with partial indexes to get a smaller index to use for lots of your queries. > Thoughts, suggestions? I see nothing in what you have written that indicates that pg can not do the job, and do it well. It's however very hard to know exactly what is the bottleneck before one tries. There are lots of cases where people have converted mysql applications to postgresql and have gotten a massive speedup. You could be lucky and have such a case, who knows.. I spend some time each day supporting people using postgresql in the #postgresql irc channel (on the freenode.net network). There I talk to people doing both small and big conversions and the majority is very happy with the postgresql performance. Postgresql have gotten faster and faster with each release and while speed was a fair argument a number of years ago it's not like that today. That said, in the end it depends on the application. We are all interested in how it goes (well, at least me :-), so feel free to send more mails keeping us posted. Good luck. -- /Dennis Björklund ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] The usual sequential scan, but with LIMIT !
On Mon, 6 Sep 2004, [iso-8859-15] Pierre-Frédéric Caillaud wrote: > Why is it that way ? The planner should use the LIMIT values when > planning the query, should it not ? And it do use limit values, the estimated cost was lower when you had the limit, What you need to do is to tune pg for your computer. For example the following settings: * effective_cache - this setting tells pg how much the os are caching (for example use top to find out during a normal work load). You said that the tables fit in memory and by telling pg how much is cached it might adjust it's plans accordingly. * random_page_cost - how expensive is a random access compared to seq. access. This is dependent on the computer and disk system you have. If the setting above does not help, maybe you need to lower this to variable to 2 or something. And don't forget the shared_buffer setting. But most people usually have it tuned in my experience (but usually too high). Here is an article that might help you: http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html -- /Dennis Björklund ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Why those queries do not utilize indexes?
On Fri, 27 Aug 2004, Artimenko Igor wrote: > 1. Sequential search and very high cost if set enable_seqscan to on; > Seq scan on messageinfo ( cost=0.00..24371.30, rows =36802 ) > > 2. Index scan but even bigger cost if set enable_seqscan to off; > Index messagesStatus on messageinfo ( Cost=0.00..27220.72, rows=36802 ) So pg thinks that a sequential scan will be a little bit faster (The cost is a little bit smaller). If you compare the actual runtimes maybe you will see that pg was right. In this case the cost is almost the same so the runtime is probably almost the same. When you have more data pg will start to use the index since then it will be faster to use an index compared to a seq. scan. -- /Dennis Björklund ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Optimizer Selecting Incorrect Index
On Wed, 25 Aug 2004, Richard Huxton wrote: > > Index Scan using trptserc on trans (cost=0.00..465.10 rows=44 width=118) > >Index Cond: (trn_patno = 19) > >Filter: ((trn_old_date >= '1994-08-23'::date) AND (trn_old_date <= > > '2004-08-23'::date) AND (trn_bill_inc = 'B'::bpchar)) > > (687 rows) > > > Index Scan using todate on trans (cost=0.00..105165.74 rows=1 width=143) > >Index Cond: ((trn_old_date >= '1994-08-23'::date) AND (trn_old_date <= > > '2004-08-23'::date)) > >Filter: ((trn_patno = 19) AND (trn_bill_inc = 'B'::bpchar)) > > (713 rows) > > These queries are different. The first returns 687 rows and the second > 713 rows. The 687 and 713 are the number of rows in the plan, not the number of rows the queries return. -- /Dennis Björklund ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Performance Bottleneck
On Tue, 3 Aug 2004, Martin Foster wrote: > to roughly 175 or more. Essentially, the machine seems to struggle > to keep up with continual requests and slows down respectively as > resources are tied down. I suggest you try to find queries that are slow and check to see if the plans are optimal for those queries. There are some logging options for logging quries that run longer then a user set limit. That can help finding the slow queries. Just doing some logging for some typical page fetches often show things that can be done better. For example, it's not uncommon to see the same information beeing pulled several times by misstake. Maybe you can also try something like connection pooling. I'm not sure how much that can give, but for small queries the connection time is usually the big part. > Would disabling 'fsync' provide more performance if I choose that > information may be lost in case of a crash? I would not do that. In most cases the performance increase is modest and the data corruption risk after a crash is much bigger so it's not worth it. If you have a lot of small inserts then it might be faster with this, but if possible it's much better to try to do more work in a transaction then before. -- /Dennis Björklund ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Query gets slow when where clause increases
On Tue, 29 Jun 2004, James Antill wrote: >-> Index Scan using idx_ticket_groups_assigned on ticket_groups g > (cost=0.00..241.76 rows=5 width=20) (actual time=0.13..12.67 rows=604 loops=1) > Index Cond: (assigned_to = 1540) Here the planner estimated that it would find 5 rows, but it did find 604. I take that as a sign that you have not ran VACUUM ANALYZE recently? If you done that, then maybe you need to change the statistics target for that column. Before you set it on that column you could try to just alter the default statistics target for one session like this: SET default_statistics_target TO 100; ANALYZE; and then see if you get a better plan when you run the query afterwards. If it helps you can either set the default_statistics_target in postgresql.conf or set it just for some column using ALTER TABLE. -- /Dennis Björklund ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] no index-usage on aggregate-functions?
On Tue, 29 Jun 2004, Harald Lau (Sector-X) wrote: > > Average and sum can never use an index AFAIK, in any db server. You > > need information from every row. > > Take a look at the SQLSrv-pendant: > create index x_1 on the_table (num_found) > select avg(num_found) from the_table > -> Index Scan(OBJECT:([midata].[dbo].[THE_TABLE].[x_1]) But is it really faster is the question? This sum needs all the values in that column. As far as I know it uses the index because it uses less space on disk and thus is a little faster due to less IO. In pg the index doesn't work like that, so in pg it's faster to sum all values using the table itself. If you have a WHERE clause to only sum some values, then pg will use an index (if applicable) and you will see a speedup. For min and max the situation is different, there an index can give you the answer without scanning all rows. For that the workaround exist in pg. The pg aggregate functions are very general and no one have special cased min/max yet. Until that happen the work around works and is fast. > So, it seems that PG is not s well suited for a datawarehouse and/or > performing extensive statistics/calculations/reportings on large tables, > is it? I don't see how you can say that from your example. Just because it uses an index for the sum above does not mean that it is a lot faster. It still have to do as many additions as pg has to do. Sure, mvcc is best when you have both read and writes. But it should still be comparable in speed even if you only do reads. -- /Dennis Björklund ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Slow vacuum performance
On Thu, 17 Jun 2004, Patrick Hatcher wrote: > I have 6 million row table that I vacuum full analyze each night. The time > seems to be streching out further and further as I add more rows. I read You could try to run normal (non full) vacuum every hour or so. If you do normal vacuum often enough you probably don't need to run vacuum full at all. -- /Dennis Björklund ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Quad processor options
On Tue, 11 May 2004, Bjoern Metzdorf wrote: > I am curious if there are any real life production quad processor setups > running postgresql out there. Since postgresql lacks a proper > replication/cluster solution, we have to buy a bigger machine. Du you run the latest version of PG? I've read the thread bug have not seen any information about what pg version. All I've seen was a reference to debian which might just as well mean that you run pg 7.2 (probably not but I have to ask). Some classes of queries run much faster in pg 7.4 then in older versions so if you are lucky that can help. -- /Dennis Björklund ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] planner/optimizer question
On Fri, 30 Apr 2004, Gary Doades wrote: > I should have also pointed out that MSSQL reported that same index scan > as taking 65% of the overall query time. It was just "faster". The > overall query took 103ms in MSSQL. Are your results based on a single client accessing the database and no concurrent updates? Would adding more clients, and maybe having some client that updates/inserts into the tables, still make mssql faster then pg? Maybe it's so simple as pg being optimized for more concurrent users then mssql? I'm just asking, I don't know much about the inner workings of mssql. -- /Dennis Björklund ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Horribly slow hash join
On Sun, 18 Apr 2004, Bruno Wolff III wrote: > Another option would be to put the numbers into two int4s. For int4 or > smaller types one of these would be zero. int8s would be split between > the two. The hash function would then be defined on the two int4s. Sure, this is an internal calculation in the hash function. The only important thing is that the number 7 (for example) gives the same hash value no matter if it is an int2 or an int8 and that the hash function works well also for int8 numbers (which is does not today). At least that was the properties I understood that we wanted. We got side tracked into talking about what datatype exists in all platforms, that's not an issue at all. -- /Dennis Björklund ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Horribly slow hash join
On Sun, 18 Apr 2004, Tom Lane wrote: > > What do you mean? int8 is supported on all platformas > > No it isn't. So on platforms where it isn't you would use int4 as the biggest int then. I don't really see that as a problem. As long as you calculate the hash on the biggest int on that platform it should work. -- /Dennis Björklund ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Horribly slow hash join
On Sun, 18 Apr 2004, Tom Lane wrote: > That creates portability issues though. We do not depend on there being > a 64-bit-int type for anything except int8 itself, and I don't want to > start doing so. What do you mean? int8 is supported on all platformas and if the hasfunction would convert all numbers to int8 before making the hash it would work. I don't see any portability problems. -- /Dennis Björklund ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Horribly slow hash join
On Sat, 17 Apr 2004, Tom Lane wrote: > *some* set of inputs. (Also, I have been harboring some notions of > supporting cross-type hash joins for integer types, which will not work > unless small int8 values hash the same as int4 etc.) The simple solution would be to always extend integers to 64 bits (or whatever the biggest integer is) before calculating the hash. It makes the hash function a little slower for smaller types, but it's mostly an operation in the cpu and no memory involved, so it's probably not noticable. -- /Dennis Björklund ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] select count(*) on large tables
On Thu, 8 Apr 2004, Cosimo Streppone wrote: > The alternative solution I tried, that has an optimal > speed up, unfortunately is not a way out, and it is based > on "EXPLAIN SELECT count(*)" output parsing, which > is obviously *not* reliable. Try this to get the estimate: SELECT relname, reltuples from pg_class order by relname; > The times always get better doing a vacuum (and eventually > reindex) of the table, and they slowly lower again. Yes, the estimate is updated by the analyze. > Is there an estimate time for this issue to be resolved? It's not so easy to "fix". The naive fixes makes other operations slower, most notably makes things less concurrent which is bad since it wont scale as good for many users then. You can always keep the count yourself and have some triggers that update the count on each insert and delete on the table. It will of course make all inserts and deletes slower, but if you count all rows often maybe it's worth it. Most people do not need to count all rows in a table anyway. You usually count all rows such as this and that (some condition). -- /Dennis Björklund ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] good pc but bad performance,why?
On Wed, 7 Apr 2004, huang yaqin wrote: > You said turning fsync off may cause losing data, that's terrible. I use > SCSI disk, and file system is ext3. I tune postgresql.conf and can't get > any improvement. So what can I do? Make sure you do as much as possible inside one transaction. If you want to do 1000 inserts, then do BEGIN; insert ; insert; ... ; COMMIT; -- /Dennis Björklund ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] simple query join
On Mon, 8 Mar 2004, Chris Smith wrote: > assetid| integer | not null default 0 > assetid| character varying(255) | not null default '0' The types above does not match, and these are the attributes you use to join. -- /Dennis Björklund ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] speeding up a select with C function?
On Sun, 7 Mar 2004, David Teran wrote: > we need to optimize / speed up a simple select: > > explain analyze select > ((t0.int_value-t1.int_value)*(t0.int_value-t1.int_value)) > from job_property t0, job_property t1 > where t0.id_job_profile = 5 > and t1.id_job_profile = 6 > and t1.id_job_attribute = t0.id_job_attribute > and t1.int_value < t0.int_value; Try to add an index on (id_job_profile, id_job_attribute) or maybe even (id_job_profile, id_job_attribute, int_value) -- /Dennis Björklund ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Fixed width rows faster?
On Sat, 6 Mar 2004, Andrew Sullivan wrote: > > places is to ensure that the column can be indexed. Postgres, it seems, > > refuses to insert a string that is longer than some value into an > > indexed column, and I'll rather have such errors flagged while inserting > > Care to provide some details of this? It sure sounds like a bug to > me, if it's true. I've never run into anything like this, though. There is a limit of the size of values that can be indexed. I think it's 8k or something (a block I assume). Something for someone with an itch to fix in the future. The error however comes when you try to insert the value. Doing a reindex will not change the length of the value and will always work. -- /Dennis Björklund ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Fixed width rows faster?
On Fri, 5 Mar 2004, Jeff wrote: > Is there any performance advantage to using a fixed width row (with PG)? As far as I know there is only a small win when you want to extract some field from a tuple and with variable width fields you have to walk to the correct field. But this is a small performance problem unless you have very many variable size columns in the table. > different databases and we have also some custom software here that > uses fixed width rows to be able to hit row N in O(1) This can not happen in pg since there is no row N. Every transaction can have a different view of the table, some rows are visible and some others are not. To find row N you have to walk from the start and inspect every tuple to see if it's visible to this transaction or not. -- /Dennis Björklund ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] slow database
On Wed, 11 Feb 2004 [EMAIL PROTECTED] wrote: > somebody please knows to give tips to me to increase the > performance Run VACUUM ANALYZE. Find one query that is slow. Run EXPLAIN ANALYZE on that query. Read the plan and figure out why it is slow. Fix it. -- /Dennis Björklund ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] query optimization question
On Thu, 29 Jan 2004, Jack Coates wrote: > > Probably better to repost it as a gzip'd attachment. That should > > complete with a picture of the GUI version. 26k zipped, let's see if > this makes it through. Are you sure you attached it? At least when it got here there was no attachment. -- /Dennis Björklund ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] query optimization question
On Thu, 29 Jan 2004, Tom Lane wrote: > > jackdb-# GROUP BY memberid_ HAVING ( > > Um, that's not what I had in mind at all. Does GROUP BY actually do > anything at all here? (You didn't answer me as to whether memberid_ > is a unique identifier or not, but if it is, this GROUP BY is just an > expensive no-op.) >From your comment I assume that there is no transformation in pg that detects that the group by columns are unique? > this is all blue-sky speculation anyway. What I'm actually more > interested in is your statement that MSSQL can do the original query > quickly. I find that a bit hard to believe because I don't see any > relevant optimization techniques. Getting rid of the group by would not give that kind of speedup? Maybe mssql manage to rewrite the query like that before executing. -- /Dennis Björklund ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Really slow even after vacuum
On Wed, 21 Jan 2004, Jeroen Baekelandt wrote: > jms_messages again. It takes 80 seconds!?! While before, with 1000 > records, it took only a fraction of a second. run: VACUUM FULL ANALYZE; -- /Dennis Björklund ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] annoying query/planner choice
On Sun, 11 Jan 2004, Andrew Rawnsley wrote: > 20-25% of the time. Fiddling with CPU_TUPLE_COST doesn't do anything > until I exceed 0.5, which strikes me as a bit high (though please > correct me if I am assuming too much...). RANDOM_PAGE_COST seems to have > no effect. What about the effective cache size, is that set properly? -- /Dennis Björklund ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Explain not accurate
On Fri, 9 Jan 2004, Richard van den Berg wrote: > problems. However, I've run into an issue where explain tells us a the > costs of a quiry are tremendous (105849017586), but the query actually > runs quite fast. Even "explain analyze" shows these costs. It would be helpful if you can show the query and the EXPLAIN ANALYZE of the query (and not just EXPLAIN). > This makes me wonder: can the estimates explain shows be dead wrong? Of course they can. An estimation is just an estimation. If you have not analyzed the database then it's most likely wrong. Dead wrong is not common, but not impossible. Run VACUUM ANALYZE and see if the estimate is better after that. -- /Dennis Björklund ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Slow query problem
On Fri, 9 Jan 2004, Richard Huxton wrote: > > > select invheadref, invprodref, sum(units) > > > from invtran > > > group by invheadref, invprodref > > > For the above query, shouldn't you have one index for both columns > > (invheadref, invprodref). Then it should not need to sort at all to do the > > grouping and it should all be fast. > > Not sure if that would make a difference here, since the whole table is being > read. The goal was to avoid the sorting which should not be needed with that index (I hope). So I still think that it would help in this case. -- /Dennis Björklund ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Slow query problem
On Thu, 8 Jan 2004, Bradley Tate wrote: > We've set up a little test box (1GHz Athlon, 40G IDE drive, 256M RAM, > Redhat 9) to do some basic comparisons between postgresql and firebird > 1.0.3 and 1.5rc8. Mostly the results are comparable, with one > significant exception. > > QUERY > select invheadref, invprodref, sum(units) > from invtran > group by invheadref, invprodref > > RESULTS > pg 7.3.4 - 5.5 min > pg 7.4.0 - 10 min > fb 1.0.3 - 64 sec > fb 1.5 - 44 sec > > * The invtran table has about 2.5 million records, invheadref and > invprodref are both char(10) and indexed. For the above query, shouldn't you have one index for both columns (invheadref, invprodref). Then it should not need to sort at all to do the grouping and it should all be fast. -- /Dennis Björklund ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Question about difference in performance of 2 queries
On Mon, 29 Dec 2003, Sean Shanny wrote: > The first plan below has horrendous performance. we only get about 2% > CPU usage and iostat shows 3-5 MB/sec IO. The second plan runs at 30% > cpu and 15-30MB.sec IO. > > Could someone shed some light on why the huge difference in > performance? Both are doing index scans plus a filter. We have no > content_keys below -1 at this time so the queries return the same results. EXPLAIN ANALYZE gives more information then EXPLAIN, and is prefered. It uses different indexes in the two queries, and one seems to be faster then the other. Why, I can't tell yet. I would assume that you would get the fastet result if you had an index (content_key, date_key) I don't know if pg will even use an index to speed up a <> operation. When you had > then it could use the idx_pageviews_content index. Why it choose that when the other would be faster I don't know. Maybe explain analyze will give some hint. -- /Dennis ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Why is restored database faster?
On Thu, 18 Dec 2003, Shridhar Daithankar wrote: > Well, then the only issue left is file sytem defragmentation. And the internal fragmentation that can be "fixed" with the CLUSTER command. -- /Dennis ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Why is restored database faster?
On Tue, 16 Dec 2003, David Shadovitz wrote: > I backed up my database using pg_dump, and then restored it onto a different > server using psql. I see that the query "SELECT COUNT(*) FROM myTable" > executes immediately on the new server but takes several seconds on the old > one. (The servers are identical.) > > What could account for this difference? Clustering? How can I get the > original server to perform as well as the new one? You probably need to run VACUUM FULL. It locks the tables during its execution so only do it when the database is not in full use. If this helps you probably need to do normal vacuums more often and maybe tune the max_fsm_pages to be bigger. -- /Dennis ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] a lot of problems with pg 7.4
On Sat, 13 Dec 2003, Kari Lavikka wrote: > I evaluated pg 7.4 on our development server and it looked just fine > but performance with production loads seems to be quite poor. Most of > performance problems are caused by nonsensical query plans Some of the estimates that pg made in the plans you showed was way off. I assume you have run VACUUM ANALYZE recently? If that does not help maybe you need to increaste the statistics gathering on some columns so that pg makes better estimates. With the wrong statistics it's not strange that pg chooses bad plans. -- /Dennis ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Optimize
On Mon, 24 Nov 2003, Torsten Schulz wrote: > sort_mem = 32768 # min 32 32 meg per sort can be a lot in total if you have many clients sorting things. I assume you have checked so that the computer is not pushed into swapping when you have the peak with lots of users. A swapping computer is never fast. Using some swap space is not bad, but a lot of page in and page out to the swap is not good. -- /Dennis ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Help with count(*)
On Fri, 14 Nov 2003, Tom Lane wrote: > I believe the previous discussion also agreed that we wanted to postpone > the freezing of now(), which currently also happens at BEGIN rather than > the first command after BEGIN. Or should that happen at the first call to now()? /me should ge back and try to find this previous discussion. -- /Dennis ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] *very* slow query to summarize data for a month ...
On Mon, 10 Nov 2003, Marc G. Fournier wrote: > > explain analyze SELECT ts.company_id, company_name, SUM(ts.bytes) AS total_traffic > FROM company c, traffic_logs ts >WHERE c.company_id = ts.company_id > AND month_trunc(ts.runtime) = '2003-10-01' > GROUP BY company_name,ts.company_id; What if you do ts.runtime >= '2003-10-01' AND ts.runtime < '2003-11-01' and add an index like (runtime, company_name, company_id)? -- /Dennis ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] One or more processor ?
On Fri, 10 Oct 2003, [iso-8859-15] Hervé Piedvache wrote: > A simple question about PostgreSQL ... I have a Pentium Xeon Quadri > processors ... If I do a SQL request ... does PostgreSQL use one or more > processor ? Each connection becomes a process, and each process runs on one processor. So, with only one connection you use only one processor (and the OS might use an other processor). Most databases has many concurrent users and then it will use more processors. -- /Dennis ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] OFFTOPIC: PostgreSQL vs MySQL
On Thu, 9 Oct 2003, David Griffiths wrote: > > > "PostgreSQL supports constraints. MySQL doesn't; programmers need to > > > take care of that from the client side" > > > Again, InnoDB supports constraints. > > > > Really? This is news. We did some tests on constraints on InnoDB, and > > found that while they parsed, they were not actually enforced.Was > > our test in error? > > You may have turned them off to load data? I've run into constraints > when my data-load script missed some rows in address_type. When it went > to do the address_list table, all rows that had the missing address_type > failed, as they should. I saw no weakness in the constraints. It sounds like you talk about foreign keys only, while the previous writer talkes about other constraints also. For example, in postgresql you can do: CREATE TABLE foo ( x int, CONSTRAINT bar CHECK (x > 5) ); and then # INSERT INTO foo VALUES (4); ERROR: ExecInsert: rejected due to CHECK constraint "bar" on "foo" I don't know MySQL, but I've got the impression from other posts on the lists that innodb supports foreign keys only. I might be wrong though. -- /Dennis ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] restore time: sort_mem vs. checkpoing_segments
On Tue, 23 Sep 2003, Bruce Momjian wrote: > With the new warning about too-frequent checkpoints, people have actual > feedback to encourage them to increase checkpoint_segments. One issue > is that it is likely to recommend increasing checkpoint_segments during > restore, even if there is no value to it being large during normal > server operation. Should that be decumented? One could have a variable that turns off that warning, and have pg_dump insert a statement to turn it off. That is, if one never want these warnings from a restore (from a new dump). In any case, documentation is good and still needed. -- /Dennis ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] LIKE query running slow
On Tue, 23 Sep 2003, Garrett Bladow wrote: > Recently we upgraded the RAM in our server. After the install a LIKE > query that used to take 5 seconds now takes 5 minutes. We have tried the > usual suspects, VACUUM, ANALYZE and Re-indexing. If you mean that you reinstalled postgresql then it's probably because you before run the database with the "C" locale but now you run it with something else. If all you did was to install the extra memory then I don't see how that can affect it at all (especially so if you have not altered postgresql.conf to make use of more memory). -- /Dennis ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Reading data in bulk - help?
On Wed, 10 Sep 2003, Chris Huston wrote: > Sometime before then I need to figure out how to cut the fetch times > from the now 200ms to something like 10ms. You didn't say anything about Joshs first point of adjusting postgresql.conf to match your machine. Settings like effective_cache_size you almost always want to increase from the default setting, also shared memory. -- /Dennis ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] The results of my PostgreSQL/filesystem performance
On Tue, 26 Aug 2003, Bill Moran wrote: > As with all performance tests/benchmarks, there are probably dozens or > more reasons why these results aren't as accurate or wonderful as they > should be. Take them for what they are and hopefully everyone can > learn a few things from them. What version of pg was used in debian and redhat? For freebsd it's 7.2.4 it says on the page, but I see nothing about the other two. The version that comes with Redhat 9 (Shrike) is 7.3.2. -- /Dennis ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Please scan your computer
Someone who has my: [EMAIL PROTECTED] email address has an infected computer, infected with the SoBig.F virus. I'm getting 200+ infected emails a day from that person(s). Go to this site and do a free online virus scan. It's safe, and done by one of the two top virus scanning companies in world. I've done it several times. http://housecall.antivirus.com/ -- Dennis Gearon ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] PostgreSQL performance problem -> tuning
On Thu, 7 Aug 2003, Richard Huxton wrote: > But this parameter controls how much memory can be allocated to sorts - I > don't see how PG can figure out a reasonable maximum by itself. One could have one setting for the total memory usage and pg could use statistics or some heuristics to use the memory for different things in a good way. Then that setting could have an auto setting so it uses 40% of all memory or something like that. Not perfect but okay for most people. -- /Dennis ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Perfomance Tuning
On Tue, 12 Aug 2003, mixo wrote: > that I am currently importing data into Pg which is about 2.9 Gigs. > Unfortunately, to maintain data intergrity, data is inserted into a table > one row at a time.' So you don't put a number of inserts into one transaction? If you don't do that then postgresql will treat each command as a transaction and each insert is going to be forced out on disk (returning when the data is just in some cache is not safe even if other products might do that). If you don't do this then the server promise the client that the row have been stored but then the server goes down and the row that was in the cache is lost. It's much faster but not what you expect from a real database. So, group the inserts in transactions with maybe 1000 commands each. It will go much faster. It can then cache the rows and in the end just make sure all 1000 have been written out on disk. There is also a configuration variable that can tell postgresql to not wait until the insert is out on disk, but that is not recomended if you value your data. And last, why does it help integrity to insert data one row at a time? -- /Dennis ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Poor delete performance AFTER vacuum analyze
On Sat, 19 Jul 2003, Jeremy M. Guthrie wrote: > 100megs of new data each day. However, the instant the system finishes only > a 'vacuum analyze', the whole thing slows WAY down to where each run can take > 10-15 minutes. Have you run EXPLAIN ANALYZE on the delete query before and after the vacuum? Does it explain why it goes slower? -- /Dennis ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] index / sequential scan problem
On Fri, 18 Jul 2003, Tom Lane wrote: > >> Adjusting the cpu_tuple_cost to 0.042 got the planner to choose the index. > > > Doesn't sound very good and it will most likely make other queries slower. > > Seems like a reasonable approach to me --- certainly better than setting > random_page_cost to physically nonsensical values. Hehe, just before this letter there was talk about changing random_page_cost. I kind of responed that 0.042 is not a good random page cost. But now of course I can see that it says cpu_tuple_cost :-) Sorry for adding confusion. -- /Dennis ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] index / sequential scan problem
On Fri, 18 Jul 2003, Fabian Kreitner wrote: > Adjusting the cpu_tuple_cost to 0.042 got the planner to choose the index. Doesn't sound very good and it will most likely make other queries slower. You could always turn off sequential scan before that query and turn it on after. > Anything I need to consider when raising it to such "high" values? You could fill the table with more data and it will probably come to a point where it will stop using the seq. scan. You could of course also change pg itself so it calculates a better estimate. -- /Dennis ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Extreme high load averages
On Sun, 6 Jul 2003, Martin Foster wrote: > The processor seems to be purposely sitting there twiddling it's thumbs. > Which leads me to believe that perhaps the nice levels have to be > changed on the server itself? It could also be all the usual things that affect performance. Are your queries using indexes where it should? Do you vacuum analyze after you have updated/inserted a lot of data? It could be that some of your queries is not as efficient as it should, like doing a sequenctial scan over a table instead of an index scan. That translates into more IO needed and slower response times. Especially when you have more connections figthing for the available IO. -- /Dennis ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly