Re: [GENERAL] Statistics collection question
Well first question: how can I check if autovacuum is working? On 04/09/2007, Tom Lane <[EMAIL PROTECTED]> wrote: > "Phoenix Kiula" <[EMAIL PROTECTED]> writes: > > Basically, what I am missing is some info on actually tweaking the > > postgresql.conf to suit my system. > > No, that's *not* what you're missing. I'm not sure what the problem > is in your system, but I'm pretty sure that everything you have > frantically been tweaking is unrelated if not outright > counterproductive. You need to stop tweaking and start some methodical > evidence-gathering to figure out what the problem actually is. > > Here are some things I would suggest trying: > > 1. Do a VACUUM VERBOSE when the system is fast, and save the output. > When the system is slow, do another VACUUM VERBOSE, and compare file > sizes to see if anything seems markedly bloated. (It might be less > labor-intensive to copy pg_class.relname, reltuples, relpages columns > into another table for safekeeping after the first VACUUM, and use SQL > queries to look for markedly different sizes after the second VACUUM.) Did this. Saved the files as text files. Did not find much difference for the tables and indexes stuff. Number of pages required overall remains the same, by and large. Do I also need to compare the "pg_toast" type stuff? > 2. Set up a task to dump the results of > select * from pg_locks, pg_stat_activity where pid = procpid > into a log file every few seconds. Compare what you see when things > are fast with when they are slow. In particular you should fairly > easily be able to tell if the slow queries are waiting long for locks. Yes, did. Saved them into four different tables (scores1, scores2, where scores1 represents a time when queries were superfast, scores4 when it was pathetically slow). Then joined them all, two at a time, to track differences. The only four rows that are different across these four tables are related to my two major tables: # select scores4.relname, scores4.reltuples, scores4.relpages, scores1.relpages from scores4 left join scores1 on scores4.relname = scores1.relname where scores4.relpages <> scores1.relpages ; relname| reltuples | relpages | relpages --+-+--+-- idx_trads_userid | 2.82735e+06 |11652 |11644 idx_trads_modifydate | 2.82735e+06 | 7760 | 7744 tradcount| 201349 | 1391 | 1388 trads_alias_key | 2.82735e+06 |16172 |16135 (6 rows) Time: 2.073 ms What do I make from this? From what I observe, some of the indexes have a few more values and a few more pages thereof. This is exactly how it should be, right? This is from a small database. > 3. Log the output of "vmstat 1" over time, compare fast and slow > periods. > Following is the vmstat from slow time: ~ > vmstat 1 procs ---memory-- ---swap-- -io --system-- cpu r b swpd free buff cache si sobibo incs us sy id wa 0 12 14136 15608 5208 355451600 200 1408 7 2 1 86 12 0 14 14136 17208 5200 355296400 052 1137 372 0 0 23 77 0 15 14136 17336 5204 355114000 060 1085 237 0 0 10 89 0 16 14136 16832 5204 35511400064 0 1108 323 0 0 25 75 0 15 14136 15872 5204 355114000 0 0 1066 242 0 0 25 75 0 16 14136 17360 5196 354646800 492 304 1144 570 1 1 29 69 0 17 14152 17744 5192 35428160 48 0 188 1127 169 1 0 25 74 0 10 14172 23312 5216 354043200 528 292 1244 453 0 1 25 74 2 3 14064 15888 5276 355014800 6644 964 1192 427 1 1 65 33 0 2 13840 16656 5232 354859600 2470860 1413 882 1 2 75 23 Not sure how to read this. We're on 4GB RAM. Thanks. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Statistics collection question
On Mon, Sep 10, 2007 at 07:05:54PM -, [EMAIL PROTECTED] wrote: > When I do a "select * from pg_locks", some of them show up as > "Exclusive Lock". This I suppose means that the whole table is locked, > right? How can I find from the "transaction id" which precise SQL > statement is taking this time? I do not have anything that should! > Simple SELECT, INSERT and UPDATE stuff in our fairly straightforward > application, and I hope that autovacuum and auto-analyze do not take > up this exclusive locks? Note: the example ExclusiveLock you showed is merely the transaction holding an exclusive lock on itself. As you can see, there is no database or relation mentioned, so it's not locking anything else. It has a shared lock on a table, but that's normal. For more info the activity, try "select * from pg_stat_activity;" Hope this helps, -- Martijn van Oosterhout <[EMAIL PROTECTED]> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [GENERAL] Statistics collection question
On Sep 4, 10:54 pm, [EMAIL PROTECTED] (Tom Lane) wrote: > "Phoenix Kiula" <[EMAIL PROTECTED]> writes: > > Would appreciate any help. Why do indexed queries take so much time? > > It's a simple DB with "10 relations" including tables and indexes. > > Simple inserts and updates, about 5000 a day, but non-trivial > > concurrent selects (about 45 million a day). Works fine when I > > restart, but a day later all goes cattywumpus. > > BTW, just to be perfectly clear: all you do is stop and restart the > postmaster (using what commands exactly?), and everything is fast again? > That's sufficiently unheard-of that I want to be entirely sure we > understood you correctly. Yes, I noticed starting the postgres database again had an effect of speed. But this does not seem to be working anymore so I suppose something else needs fixing. When I do a "select * from pg_locks", some of them show up as "Exclusive Lock". This I suppose means that the whole table is locked, right? How can I find from the "transaction id" which precise SQL statement is taking this time? I do not have anything that should! Simple SELECT, INSERT and UPDATE stuff in our fairly straightforward application, and I hope that autovacuum and auto-analyze do not take up this exclusive locks? Ref: output of the select from pg_locks -- =# select * from pg_locks; -[ RECORD 1 ]-+ locktype | transactionid database | relation | page | tuple | transactionid | 4700 classid | objid | objsubid | transaction | 4700 pid | 21989 mode | ExclusiveLock granted | t -[ RECORD 2 ]-+ locktype | relation database | 41249 relation | 10328 page | tuple | transactionid | classid | objid | objsubid | transaction | 4700 pid | 21989 mode | AccessShareLock granted | t ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Statistics collection question
"Phoenix Kiula" <[EMAIL PROTECTED]> writes: > Would appreciate any help. Why do indexed queries take so much time? > It's a simple DB with "10 relations" including tables and indexes. > Simple inserts and updates, about 5000 a day, but non-trivial > concurrent selects (about 45 million a day). Works fine when I > restart, but a day later all goes cattywumpus. BTW, just to be perfectly clear: all you do is stop and restart the postmaster (using what commands exactly?), and everything is fast again? That's sufficiently unheard-of that I want to be entirely sure we understood you correctly. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Statistics collection question
"Phoenix Kiula" <[EMAIL PROTECTED]> writes: > Basically, what I am missing is some info on actually tweaking the > postgresql.conf to suit my system. No, that's *not* what you're missing. I'm not sure what the problem is in your system, but I'm pretty sure that everything you have frantically been tweaking is unrelated if not outright counterproductive. You need to stop tweaking and start some methodical evidence-gathering to figure out what the problem actually is. Here are some things I would suggest trying: 1. Do a VACUUM VERBOSE when the system is fast, and save the output. When the system is slow, do another VACUUM VERBOSE, and compare file sizes to see if anything seems markedly bloated. (It might be less labor-intensive to copy pg_class.relname, reltuples, relpages columns into another table for safekeeping after the first VACUUM, and use SQL queries to look for markedly different sizes after the second VACUUM.) 2. Set up a task to dump the results of select * from pg_locks, pg_stat_activity where pid = procpid into a log file every few seconds. Compare what you see when things are fast with when they are slow. In particular you should fairly easily be able to tell if the slow queries are waiting long for locks. 3. Log the output of "vmstat 1" over time, compare fast and slow periods. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Statistics collection question
Phoenix Kiula wrote: > On 04/09/07, Tom Lane <[EMAIL PROTECTED]> wrote: >> I'm wondering about some transaction taking exclusive lock on the table >> and sitting on it for a minute or so, and also about network problems >> delaying transmission of data to the client. >> > How can I check what is causing the lack? When I restart pgsql it goes You mean the lock? You can check for active locks querying pg_locks > away. The log is empty for a day or too (I'm only logging errors or > slow queries) and the queries are super fast, but after a day it > starts filling up with abysmally slow queries, even on simple queries > with the WHERE clauses that have only one constant on the indexed > column! That's new information that we could have used earlier, as it means that postgres does pick the right plan (at least initially) and things like network and dns apparently work. Was the explain analyze you sent from the super fast periods or from a slow period? It'd be interesting to see a query plan of a problematic query. I suppose if you try one of your super fast queries it is slow once other queries slow down too? I ask, because I expect that query to not be in the cache at that moment, so it could be a good candidate for an explain analyze. > Basically, what I am missing is some info on actually tweaking the > postgresql.conf to suit my system. I run Apache, MySQL, Postgres, and > Exim (email server) on the same dedicated hosting server. I don't mind > if Postgres hogs 2GB of memory, but I need to know how to tweak it. I > have made about eight posts on this list with my entire > postgresql.conf posted in. I have read and re-read the manual and Yes, but you gave us conflicting information. Only now it is clear what your problem is. > that makes compiles all of it and presents the system's missteps and > guidelines may be useful, ala "Tuning Primer" script from MySQL camp) > but I am not sure where to begin! I've seen pgadmin III doing quite a nice job at that. Haven't really used it myself, I usually prefer the command line. > Would appreciate any help. Why do indexed queries take so much time? > It's a simple DB with "10 relations" including tables and indexes. > Simple inserts and updates, about 5000 a day, but non-trivial It looks like your indexes get bloated. Do you vacuum enough? It'd be a good idea to at least analyze the tables involved in those inserts regularly. If you do those inserts in a batch, be sure to call ANALYZE after commiting that batch. That helps quite a bit. Besides that... How are those disks configured? You didn't put them in a raid-5 array I hope? That wouldn't explain the above problem, but it would slow things down (such has been mentioned on this list a few times) and may thus be exaggerating the problem. Good luck! -- Alban Hertroys [EMAIL PROTECTED] magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World // ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Statistics collection question
On Tue, Sep 04, 2007 at 03:07:41PM +0800, Phoenix Kiula wrote: > How can I check what is causing the lack? When I restart pgsql it goes > away. The log is empty for a day or too (I'm only logging errors or > slow queries) and the queries are super fast, but after a day it > starts filling up with abysmally slow queries, even on simple queries > with the WHERE clauses that have only one constant on the indexed > column! Check you're not running VACUUM FULL anywhere and post the (complete) output of VACUUM VERBOSE as superuser. That way we can rule out index/table bloat. Have a nice day, -- Martijn van Oosterhout <[EMAIL PROTECTED]> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [GENERAL] Statistics collection question
Sounds like index bloat to me ... lots of updates of indexed columns = lots of extra dead index entries. Since IIRC PostgreSQL indexes (indicii?) don't store information about the "liveness" of the referenced rows, indexed reads would have to sort through a lot of dead wood to find the few live indexed entries. If you can, try to schedule a few minutes of down time every N hours and reindex the effected tables, followed by a vacuum/analyze to reclaim dead space and update stats maybe ? Admittedly hard but perhaps easier to have 5-10 minutes of down time regularly rather than very slow queries for hours on end. If this works even as a temporary solution it might point the way to a better long term fix. It sounds as if you have too many services on one server -- the contentions of each for memory and disk I/O would worry me a lot. I tend to like having dedicated DB servers except for certain light-weight development environments. And stop trying to make PostgreSQL into MySQL, or vice versa. Different engines, different regimes. Not translatable me thinks. Just sodden thoughts ... sorry for top posting (challenged email tool). Greg Williamson Senior DBA GlobeXplorer LLC, a DigitalGlobe company Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information and must be protected in accordance with those provisions. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. (My corporate masters made me say this.) -Original Message- From: [EMAIL PROTECTED] on behalf of Phoenix Kiula Sent: Tue 9/4/2007 1:07 AM To: Tom Lane Cc: Richard Broersma Jr; Alban Hertroys; Postgres General Subject: Re: [GENERAL] Statistics collection question On 04/09/07, Tom Lane <[EMAIL PROTECTED]> wrote: > "Phoenix Kiula" <[EMAIL PROTECTED]> writes: > > On 04/09/07, Richard Broersma Jr <[EMAIL PROTECTED]> wrote: > >> Thats odd, I wonder why the EXPLAIN ANALYZE time id some much less that > >> the logged select > >> statement times? > > > Because the statement has been executed and is in the cache. > > That answer is way too flippant. In particular it doesn't explain your > repeated 80sec queries --- you should have enough memory in that thing > to be caching a fair amount of your data. > > I'm wondering about some transaction taking exclusive lock on the table > and sitting on it for a minute or so, and also about network problems > delaying transmission of data to the client. > How can I check what is causing the lack? When I restart pgsql it goes away. The log is empty for a day or too (I'm only logging errors or slow queries) and the queries are super fast, but after a day it starts filling up with abysmally slow queries, even on simple queries with the WHERE clauses that have only one constant on the indexed column! As for "network problems delaying transmission of data" -- not sure what this means. MySQL is super fast on the very same system. Does pgsql require anything different? Basically, what I am missing is some info on actually tweaking the postgresql.conf to suit my system. I run Apache, MySQL, Postgres, and Exim (email server) on the same dedicated hosting server. I don't mind if Postgres hogs 2GB of memory, but I need to know how to tweak it. I have made about eight posts on this list with my entire postgresql.conf posted in. I have read and re-read the manual and devoured as many google-groups archives of this list as I possibly can. I am looking at plenty of catalogue and stats tables (a utility that makes compiles all of it and presents the system's missteps and guidelines may be useful, ala "Tuning Primer" script from MySQL camp) but I am not sure where to begin! Would appreciate any help. Why do indexed queries take so much time? It's a simple DB with "10 relations" including tables and indexes. Simple inserts and updates, about 5000 a day, but non-trivial concurrent selects (about 45 million a day). Works fine when I restart, but a day later all goes cattywumpus. TIA! ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Statistics collection question
On 04/09/07, Tom Lane <[EMAIL PROTECTED]> wrote: > "Phoenix Kiula" <[EMAIL PROTECTED]> writes: > > On 04/09/07, Richard Broersma Jr <[EMAIL PROTECTED]> wrote: > >> Thats odd, I wonder why the EXPLAIN ANALYZE time id some much less that > >> the logged select > >> statement times? > > > Because the statement has been executed and is in the cache. > > That answer is way too flippant. In particular it doesn't explain your > repeated 80sec queries --- you should have enough memory in that thing > to be caching a fair amount of your data. > > I'm wondering about some transaction taking exclusive lock on the table > and sitting on it for a minute or so, and also about network problems > delaying transmission of data to the client. > How can I check what is causing the lack? When I restart pgsql it goes away. The log is empty for a day or too (I'm only logging errors or slow queries) and the queries are super fast, but after a day it starts filling up with abysmally slow queries, even on simple queries with the WHERE clauses that have only one constant on the indexed column! As for "network problems delaying transmission of data" -- not sure what this means. MySQL is super fast on the very same system. Does pgsql require anything different? Basically, what I am missing is some info on actually tweaking the postgresql.conf to suit my system. I run Apache, MySQL, Postgres, and Exim (email server) on the same dedicated hosting server. I don't mind if Postgres hogs 2GB of memory, but I need to know how to tweak it. I have made about eight posts on this list with my entire postgresql.conf posted in. I have read and re-read the manual and devoured as many google-groups archives of this list as I possibly can. I am looking at plenty of catalogue and stats tables (a utility that makes compiles all of it and presents the system's missteps and guidelines may be useful, ala "Tuning Primer" script from MySQL camp) but I am not sure where to begin! Would appreciate any help. Why do indexed queries take so much time? It's a simple DB with "10 relations" including tables and indexes. Simple inserts and updates, about 5000 a day, but non-trivial concurrent selects (about 45 million a day). Works fine when I restart, but a day later all goes cattywumpus. TIA! ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Statistics collection question
"Phoenix Kiula" <[EMAIL PROTECTED]> writes: > On 04/09/07, Richard Broersma Jr <[EMAIL PROTECTED]> wrote: >> Thats odd, I wonder why the EXPLAIN ANALYZE time id some much less that the >> logged select >> statement times? > Because the statement has been executed and is in the cache. That answer is way too flippant. In particular it doesn't explain your repeated 80sec queries --- you should have enough memory in that thing to be caching a fair amount of your data. I'm wondering about some transaction taking exclusive lock on the table and sitting on it for a minute or so, and also about network problems delaying transmission of data to the client. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Statistics collection question
On 04/09/07, Richard Broersma Jr <[EMAIL PROTECTED]> wrote: > --- Phoenix Kiula <[EMAIL PROTECTED]> wrote: > > LOG: duration: 93473.282 ms statement: select t_info, dstats, id > > from trades where t_alias = '17huv' and status = 'Y' > > > > --- > > > > Index Scan using trades_unique_t_alias on trades (cost=0.00..3.41 > > > Time: 2.990 ms > > > Thats odd, I wonder why the EXPLAIN ANALYZE time id some much less that the > logged select > statement times? > Because the statement has been executed and is in the cache. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Statistics collection question
--- Phoenix Kiula <[EMAIL PROTECTED]> wrote: > LOG: duration: 93473.282 ms statement: select t_info, dstats, id > from trades where t_alias = '17huv' and status = 'Y' > > --- > > Index Scan using trades_unique_t_alias on trades (cost=0.00..3.41 > Time: 2.990 ms Thats odd, I wonder why the EXPLAIN ANALYZE time id some much less that the logged select statement times? Regards, Richard Broersma Jr. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Statistics collection question
On 03/09/07, Alban Hertroys <[EMAIL PROTECTED]> wrote: > Phoenix Kiula wrote: > As I understand it it's a sample of how the data is distributed. > Probably it's based on statistical mathematics that specifies a minimum > size for a representive sample of a given data set. It boils down to: > "If you want to know how many people like vanilla ice cream, how many > people do you need to ask their preference?". Thanks for this explanation. So I should set the statistics on my indexed column to 10 or so? I made it 1000 this morning, trying to see how it would affect performance. > That's definitely not normal. I have a smallish table here containing > 2.5 million records, and querying for one with a specific index takes > 141 micro(!) seconds. The hardware involved is a dual opteron with 4G, > in a xen domain; I don't know what disks are used, but I doubt they're > raptors. > > So something is wrong with your setup, that much is obvious. I sincerely > doubt that postgres is to blame here. > > You did check that you're not connecting through the internet and > getting a DNS timeout? I am getting these times from the postgres log (pglog). I have setup the minimum query time as 5000 (ms). Here is an except from my log...which is constantly updated with more and more of these! Here's an excerpt from the log. It looks abysmal!! --- LOG: duration: 85865.904 ms statement: select t_info, dstats, id from trades where t_alias = '1q8bf' and status = 'Y' LOG: duration: 83859.505 ms statement: select t_info, dstats, id from trades where t_alias = '1a7iv' and status = 'Y' LOG: duration: 71922.423 ms statement: select t_info, dstats, id from trades where t_alias = 'bvu' and status = 'Y' LOG: duration: 74924.741 ms statement: select t_info, dstats, id from trades where t_alias = 'nt3g' and status = 'Y' LOG: duration: 82471.036 ms statement: select t_info, dstats, id from trades where t_alias = '15p8m' and status = 'Y' LOG: duration: 90015.410 ms statement: select t_info, dstats, id from trades where t_alias = 'pkfi' and status = 'Y' LOG: duration: 72713.815 ms statement: select t_info, dstats, id from trades where t_alias = 'evdi' and status = 'Y' LOG: duration: 88054.444 ms statement: select t_info, dstats, id from trades where t_alias = '1a8zj' and status = 'Y' LOG: duration: 94502.678 ms statement: select t_info, dstats, id from trades where t_alias = '1d188' and status = 'Y' LOG: duration: 82178.724 ms statement: select t_info, dstats, id from trades where t_alias = 'q8zu' and status = 'Y' LOG: duration: 107030.741 ms statement: select t_info, dstats, id from trades where t_alias = 'jnzu' and status = 'Y' LOG: duration: 87634.723 ms statement: select t_info, dstats, id from trades where t_alias = 'tav9' and status = 'Y' LOG: duration: 104271.695 ms statement: select t_info, dstats, id from trades where t_alias = '37tk7' and status = 'Y' LOG: duration: 88726.671 ms statement: select t_info, dstats, id from trades where t_alias = 'tavc' and status = 'Y' LOG: duration: 74710.120 ms statement: select t_info, dstats, id from trades where t_alias = '1q8zu' and status = 'Y' LOG: duration: 93100.863 ms statement: select t_info, dstats, id from trades where t_alias = '1ovmc' and status = 'Y' LOG: duration: 83659.489 ms statement: select t_info, dstats, id from trades where t_alias = '1p9ub' and status = 'Y' LOG: duration: 71963.413 ms statement: select t_info, dstats, id from trades where t_alias = '9awlia' and status = 'Y' LOG: duration: 83569.602 ms statement: select t_info, dstats, id from trades where t_alias = '2yeza' and status = 'Y' LOG: duration: 93473.282 ms statement: select t_info, dstats, id from trades where t_alias = '17huv' and status = 'Y' --- By way of an explanation, the T_INFO is a text column, DSTATS is char(1), and ID is the bigint primary key. Status can be 'Y' or 'N', so I have not included it in the index (not selective enough) but T_ALIAS is the unique index. The EXPLAIN ANALYZE output is as follows: MYUSER=# explain analyze select t_info, dstats, id from trades where t_alias = '17huv' and status = 'Y'; QUERY PLAN Index Scan using trades_unique_t_alias on trades (cost=0.00..3.41 rows=1 width=110) (actual time=0.100..0.104 rows=1 loops=1) Index Cond: ((t_alias)::text = '17huv'::text) Filter: (status = 'Y'::bpchar) Total runtime: 0.166 ms (4 rows) Time: 2.990 ms And my postgresql.conf is looking like this: max_connections = 350 shared_buffers = 21000# Not much more than 20k...http://www.revsys.com/writings/postgresql-performance.html effective_cache_size = 128000 max_fsm_relations= 100 max_fsm_pages= 15 work_mem = 16000# http:
Re: [GENERAL] Statistics collection question
Phoenix Kiula wrote: > Lots of posts here in reponse to performance question have the > recommendation "increase the stats on that column". From whatever > succint reading is made available on the postgres site, I gather that > this aids the planner in getting some info about some of the data. Am > I missing something here, or totally off-base? As I understand it it's a sample of how the data is distributed. Probably it's based on statistical mathematics that specifies a minimum size for a representive sample of a given data set. It boils down to: "If you want to know how many people like vanilla ice cream, how many people do you need to ask their preference?". > The issue is that I don't quite get why MySQL can fetch one indexed > row (i.e., SQL that ends with a very simple "WHERE indexed_column = > 'constant' ") in a matter of milliseconds, but PgSQL is taking 5 to 6 > seconds on an average at least for the first time. I use RAPTOR 15K > drives, they're not SCSI but they're not exactly "cheap disks" either. > And I have 4GB RAM. The explain select shows that index is being > used! That's definitely not normal. I have a smallish table here containing 2.5 million records, and querying for one with a specific index takes 141 micro(!) seconds. The hardware involved is a dual opteron with 4G, in a xen domain; I don't know what disks are used, but I doubt they're raptors. So something is wrong with your setup, that much is obvious. I sincerely doubt that postgres is to blame here. You did check that you're not connecting through the internet and getting a DNS timeout? Regards, -- Alban Hertroys [EMAIL PROTECTED] magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World // ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Statistics collection question
Phoenix Kiula escribió: > On 03/09/07, Tom Lane <[EMAIL PROTECTED]> wrote: > > "Phoenix Kiula" <[EMAIL PROTECTED]> writes: > > > most_common_vals will (and should) be empty if there aren't actually any > > common values, but aren't you getting a histogram? Exactly what > > performance do you think will be improved? > > > Lots of posts here in reponse to performance question have the > recommendation "increase the stats on that column". From whatever > succint reading is made available on the postgres site, I gather that > this aids the planner in getting some info about some of the data. Am > I missing something here, or totally off-base? > > The issue is that I don't quite get why MySQL can fetch one indexed > row (i.e., SQL that ends with a very simple "WHERE indexed_column = > 'constant' ") in a matter of milliseconds, but PgSQL is taking 5 to 6 > seconds on an average at least for the first time. I use RAPTOR 15K > drives, they're not SCSI but they're not exactly "cheap disks" either. > And I have 4GB RAM. The explain select shows that index is being > used! Let's see the explain output? I doubt your games with stats have anything to do with it. Maybe it is having to scan a lot of dead tuples or something like that. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Statistics collection question
On 03/09/07, Tom Lane <[EMAIL PROTECTED]> wrote: > "Phoenix Kiula" <[EMAIL PROTECTED]> writes: > most_common_vals will (and should) be empty if there aren't actually any > common values, but aren't you getting a histogram? Exactly what > performance do you think will be improved? Lots of posts here in reponse to performance question have the recommendation "increase the stats on that column". From whatever succint reading is made available on the postgres site, I gather that this aids the planner in getting some info about some of the data. Am I missing something here, or totally off-base? The issue is that I don't quite get why MySQL can fetch one indexed row (i.e., SQL that ends with a very simple "WHERE indexed_column = 'constant' ") in a matter of milliseconds, but PgSQL is taking 5 to 6 seconds on an average at least for the first time. I use RAPTOR 15K drives, they're not SCSI but they're not exactly "cheap disks" either. And I have 4GB RAM. The explain select shows that index is being used! TIA. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Statistics collection question
"Phoenix Kiula" <[EMAIL PROTECTED]> writes: > A couple of questions about the "most_common_vals" stuff in pg_stats > for a high traffic table: > 1. Can I tell the stats collector to collect only values of a column > where a certain regex is matched? Not directly, but you could set up a partial index defined that way, and ANALYZE would collect stats on the index contents. Whether the planner could actually do anything with the information is another story; I suspect you're wasting your time with this idea. > 2. Secondly, for a unique column in the table, will the > "most_common_vals" always be -1? I guess this could make sense, but I > was wondering if the stats collector could somehow collect at least > 1000 unique values to improve at least some performance. most_common_vals will (and should) be empty if there aren't actually any common values, but aren't you getting a histogram? Exactly what performance do you think will be improved? regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend