Re: [PERFORM] : Tracking Full Table Scans
Yes. I am looking for the justified full table scans. If bigger tables are getting scanned, I would like to know %age rows scanned against %age rows as the output. If the query needs 80% of the rows as the output, then a full table scan is always better. I believe there is a possibility for this in Postgres. I think we can get this using pg_stat_user_table, pg_statio_user_tables and pg_stats. I will post the calculation once it i get it. Thanks VB On Wed, Sep 28, 2011 at 6:25 AM, Craig Ringer wrote: > On 09/28/2011 12:26 AM, Venkat Balaji wrote: > >> Thanks a lot Kevin !! >> >> Yes. I intended to track full table scans first to ensure that only >> small tables or tables with very less pages are (as you said) getting >> scanned full. >> > > It can also be best to do a full table scan of a big table for some > queries. If the query needs to touch all the data in a table - for example, > for an aggregate - then the query will often complete fastest and with less > disk use by using a sequential scan. > > I guess what you'd really want to know is to find out about queries that do > seqscans to match relatively small fractions of the total tuples scanned, ie > low-selectivity seqscans. I'm not sure whether or not it's possible to > gather this data with PostgreSQL's current level of stats detail. > > -- > Craig Ringer >
Re: [PERFORM] : Tracking Full Table Scans
Thanks Kevin !! I will have a look at the source tree. Regards VB On Tue, Sep 27, 2011 at 10:45 PM, Kevin Grittner < kevin.gritt...@wicourts.gov> wrote: > Venkat Balaji wrote: > > > I would like to know the difference between "n_tup_upd" and > > "n_tup_hot_upd". > > A HOT update is used when none of the updated columns are used in an > index and there is room for the new tuple (version of the row) on > the same page as the old tuple. This is faster for a number of > reasons, and cleanup of the old tuple is a little different. > > If you want the gory implementation details, take a look at this > file in the source tree: > > src/backend/access/heap/README.HOT > > -Kevin >
Re: [PERFORM] overzealous sorting?
On Tuesday 27 September 2011 19:22, Mark Kirkwood wrote: > > The query that I've shown is one of a sequence of queries with the > > timestamp range progressing in steps of 1 hour through the timestamp > > range. All I want PG to do is find the range in the index, find the > > matching records in the table and return them. All of the planner's > > cleverness just seems to get in the way. > > It is not immediately clear that the planner is making the wrong choices > here. Index scans are not always the best choice, it depends heavily on > the correlation of the column concerned to the physical order of the > table's heap file. I suspect the reason for the planner choosing the > bitmap scan is that said correlation is low (consult pg_stats to see). > Now if you think that the table's heap data is cached anyway, then this > is not such an issue - but you have to tell the planner that by reducing > random_page_cost (as advised previously). Give it a try and report back! > > regards > > Mark I don't expect that any of it is cached. It is supposed to be a once-a-day linear scan of a slice of the table. The correlation on the timestamp is reported as 0.0348395. I can't use cluster since it would lock the table for too long. I would try a cursor but my framework for this case doesn't support cursors. In a later version of the framework I've tried cursors and haven't found them to be faster than reading in slices, in the tests I've done. Anyway at the moment it is fast enough. Thanks -- Anthony Shipman | It's caches all the way anthony.ship...@symstream.com | down. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] : Tracking Full Table Scans
On 09/28/2011 12:26 AM, Venkat Balaji wrote: Thanks a lot Kevin !! Yes. I intended to track full table scans first to ensure that only small tables or tables with very less pages are (as you said) getting scanned full. It can also be best to do a full table scan of a big table for some queries. If the query needs to touch all the data in a table - for example, for an aggregate - then the query will often complete fastest and with less disk use by using a sequential scan. I guess what you'd really want to know is to find out about queries that do seqscans to match relatively small fractions of the total tuples scanned, ie low-selectivity seqscans. I'm not sure whether or not it's possible to gather this data with PostgreSQL's current level of stats detail. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] postgres constraint triggers
On 09/27/2011 12:54 PM, Ben Chobot wrote: On Sep 26, 2011, at 10:52 AM, Maria L. Wilson wrote: Our first try to solve this problem has been to convert these triggers into a constraint trigger which allows for DEFERRABLE INITIALLY DEFERRED flags. This, we are finding, is forcing the trigger function to run after the triggering transaction is completed. We believe this will fix our locking problem and hopefully speed up our inserts again. Any comments or past experiences would certainly be helpful! My memory is fuzzy but as I recall, a possible downside to using deferred constraints was increased memory usage That's right. PostgreSQL doesn't currently support spilling of pending constraint information to disk; it has to keep it in RAM, and with sufficiently huge deferred updates/inserts/deletes it's possible for the backend to run out of RAM to use. though I cannot see how at the moment. A list of which triggers to run, and on which tuples, must be maintained until those triggers are fired. That list has to be kept somewhere. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] : Tracking Full Table Scans
Venkat Balaji wrote: > I would like to know the difference between "n_tup_upd" and > "n_tup_hot_upd". A HOT update is used when none of the updated columns are used in an index and there is room for the new tuple (version of the row) on the same page as the old tuple. This is faster for a number of reasons, and cleanup of the old tuple is a little different. If you want the gory implementation details, take a look at this file in the source tree: src/backend/access/heap/README.HOT -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance Anomaly with "col in (A,B)" vs. "col = A OR col = B" ver. 9.0.3
Hi Sam, The purpose of this (framework generated) code is to find out if there is at least one row that has one of the selected hts_code_ids. We don't care about anything that's returned other then whether at least one row exists or not (rewriting the query with EXISTS generates that same plan). The actual selectivity can vary greatly anywhere from 0 to > 500k rows depending on the codes chosen. On the higher end of the range a select count(*) takes ~14 times longer then a limit 1 query that does an index scan (~475ms vs. 36ms). What we're doing now for the moment is putting a straight-jacket on the planner begin ; set local enable_seqscan = off ; SELECT "exp_detls".id FROM "exp_detls" WHERE ("exp_detls"."hts_code_id" IN (...)) LIMIT 1; commit ; As even when the where clause selects a good fraction of the table seq_scan has highly variable performance because of the clustered index (depends on what was last selected out of the table), so we pretty much never want to use it (maybe the planner should be taking the cluster into consideration?). What we'll probably move to is maintaining a table of ids that are present in that column and running the query against that as the above, while acceptable, can still be on the slow side when the where clause is not very selective and many rows are scanned in the index before the limit can be applied. Would be nice if the bitmap index scan could be done piecemeal alternating with heap scan when a tight limit is present so not so much work has to be done, but I could see that being really problematic to implement and use. Tim On Tue, Sep 27, 2011 at 1:06 AM, Samuel Gendler wrote: > > > On Mon, Sep 26, 2011 at 2:11 PM, Timothy Garnett wrote: > >> >> Though maybe in a lot of common use situations people only supply values >> that are known present so maybe this would make things worse more often then >> better (maybe limit 1 or better EXISTS would be a hint the value is not >> known present). Experimenting a bit it doesn't seem to matter which values >> are selected so it's not taking into account any kind of distribution over >> the histogram boundaries. > > > If I'm not mistaken, the problem here is actually the LIMIT 1, yes? The > planner is opting for the sequential scan because it assumes it will > interrupt the scan relatively quickly when a row is matched? So in the case > where you are really looking for existence, perhaps the better solution is > to select a count of the number of matching rows (perhaps grouped by id so > you know which ones match)? That would emulate the behaviour of select > without a limit, which would be more likely to use the index. It all depends > on just what you are actually doing with the row you are returning, of > course, and if there is some other way to get it once you know of its > existence. > > SELECT count(1), exp_detls.id FROM exp_detls WHERE (exp_detls.hts_code_id > IN (12,654)) GROUP BY exp_detls.id > > might work, depending upon how many different values of exp_detls.id you > are likely to see for any given set of hts_code_ids. Actually, I know > little about the query planner, but it seems to me that the aggregation and > grouping might be sufficient to force it away from preferring the sequential > scan, even if you leave a 'limit 1' on the query, since it will have to find > more than 1 row in order to return a single row, since that single row > contains an aggregate. So if your concern is about the potential of > transferring millions of rows across the network, I think that might fix it, > though it is definitely a kludge. Of course, the downside is that the index > won't be as fast as a sequential scan in the cases where the scan does get > interrupted quickly, but you've clearly already considered that for your use > patterns. >
Re: [PERFORM] Performance Anomaly with "col in (A,B)" vs. "col = A OR col = B" ver. 9.0.3
Actually thinking about this a little more what we really want the planner to do is to consider the codes one at a time till it finds one that exists. If we write that out explicitly we get a good plan whether the ids are select many rows or none. => explain analyze select 1 from ( select * from (select 1 from exp_detls where hts_code_id in (469169) limit 1) a union all select * from (select 1 from exp_detls where hts_code_id in (15289) limit 1) b union all select * from (select 1 from exp_detls where hts_code_id in (468137) limit 1) c union all select * from (select 1 from exp_detls where hts_code_id in (14655) limit 1) d union all select * from (select 1 from exp_detls where hts_code_id in (14670) limit 1) e union all select * from (select 1 from exp_detls where hts_code_id in (15291) limit 1) f ) dummy limit 1; QUERY PLAN Limit (cost=0.00..1.75 rows=1 width=0) (actual time=0.031..0.032 rows=1 loops=1) -> Result (cost=0.00..10.52 rows=6 width=0) (actual time=0.029..0.029 rows=1 loops=1) -> Append (cost=0.00..10.52 rows=6 width=0) (actual time=0.029..0.029 rows=1 loops=1) -> Subquery Scan on a (cost=0.00..1.72 rows=1 width=0) (actual time=0.027..0.027 rows=1 loops=1) -> Limit (cost=0.00..1.71 rows=1 width=0) (actual time=0.025..0.025 rows=1 loops=1) -> Index Scan using index_exp_detls_on_hts_code_id_and_data_month on exp_detls (cost=0.00..144890.56 rows=84657 width=0) (actual time=0.025..0.025 rows=1 loops=1) Index Cond: (hts_code_id = 469169) -> Subquery Scan on b (cost=0.00..1.74 rows=1 width=0) (never executed) -> Limit (cost=0.00..1.73 rows=1 width=0) (never executed) -> Index Scan using index_exp_detls_on_hts_code_id_and_data_month on exp_detls (cost=0.00..118206.85 rows=68477 width=0) (never executed) Index Cond: (hts_code_id = 15289) -> Subquery Scan on c (cost=0.00..1.74 rows=1 width=0) (never executed) -> Limit (cost=0.00..1.73 rows=1 width=0) (never executed) -> Index Scan using index_exp_detls_on_hts_code_id_and_data_month on exp_detls (cost=0.00..102645.38 rows=59168 width=0) (never executed) Index Cond: (hts_code_id = 468137) -> Subquery Scan on d (cost=0.00..1.81 rows=1 width=0) (never executed) -> Limit (cost=0.00..1.80 rows=1 width=0) (never executed) -> Index Scan using index_exp_detls_on_hts_code_id_and_data_month on exp_detls (cost=0.00..2155.38 rows=1200 width=0) (never executed) Index Cond: (hts_code_id = 14655) -> Subquery Scan on e (cost=0.00..1.75 rows=1 width=0) (never executed) -> Limit (cost=0.00..1.74 rows=1 width=0) (never executed) -> Index Scan using index_exp_detls_on_hts_code_id_and_data_month on exp_detls (cost=0.00..90309.37 rows=51853 width=0) (never executed) Index Cond: (hts_code_id = 14670) -> Subquery Scan on f (cost=0.00..1.75 rows=1 width=0) (never executed) -> Limit (cost=0.00..1.74 rows=1 width=0) (never executed) -> Index Scan using index_exp_detls_on_hts_code_id_and_data_month on exp_detls (cost=0.00..84767.69 rows=48586 width=0) (never executed) Index Cond: (hts_code_id = 15291) Total runtime: 0.089 ms (28 rows) This is sub millisecond for all combinations of ids present or not that we've tried, so we'll definitely go with this. Thanks for the help and explanations! Tim On Tue, Sep 27, 2011 at 8:40 AM, Timothy Garnett wrote: > Hi Sam, > > The purpose of this (framework generated) code is to find out if there is > at least one row that has one of the selected hts_code_ids. We don't care > about anything that's returned other then whether at least one row exists or > not (rewriting the query with EXISTS generates that same plan). The actual > selectivity can vary greatly anywhere from 0 to > 500k rows depending on the > codes chosen. On the higher end of the range a select count(*) takes ~14 > times longer then a limit 1 query that does an index scan (~475ms vs. 36ms). > > What we're doing now for the moment is putting a straight-jacket on the > planner > > begin ; set local enable_seqscan = off ; SELECT "exp_detls".id FROM > "exp_detls" WHERE > ("exp_detls"."hts_code_id" IN (...)) LIMIT 1; commit ; > > As even when the where clause selects a good fraction of the table seq_scan > has highly variable performance because of the clust
Re: [PERFORM] : Tracking Full Table Scans
I would like to know the difference between "n_tup_upd" and "n_tup_hot_upd". Thanks VB On Tue, Sep 27, 2011 at 9:56 PM, Venkat Balaji wrote: > Thanks a lot Kevin !! > > Yes. I intended to track full table scans first to ensure that only small > tables or tables with very less pages are (as you said) getting scanned > full. > > I am yet to identify slow running queries. Will surely hit back with them > in future. > > Thanks > VB > > > > On Tue, Sep 27, 2011 at 8:02 PM, Kevin Grittner < > kevin.gritt...@wicourts.gov> wrote: > >> Venkat Balaji wrote: >> >> > I am preparing a plan to track the tables undergoing Full Table >> > Scans for most number of times. >> > >> > If i track seq_scan from the pg_stat_user_tables, will that help >> > (considering the latest analyzed ones) ? >> >> Well, yeah; but be careful not to assume that a sequential scan is >> always a bad thing. Here's our top ten tables for sequential scans >> in a database which is performing quite well: >> >> cc=> select seq_scan, n_live_tup, relname >> cc-> from pg_stat_user_tables >> cc-> order by seq_scan desc >> cc-> limit 10; >> seq_scan | n_live_tup | relname >> --++ >> 81264339 | 20 | MaintCode >> 16840299 | 3 | DbTranImageStatus >> 14905181 | 18 | ControlFeature >> 11908114 | 10 | AgingBoundary >> 8789288 | 22 | CtofcTypeCode >> 7786110 | 6 | PrefCounty >> 6303959 | 9 | ProtOrderHistEvent >> 5835430 | 1 | ControlRecord >> 5466806 | 1 | ControlAccounting >> 5202028 | 12 | ProtEventOrderType >> (10 rows) >> >> You'll notice that they are all very small tables. In all cases the >> entire heap fits in one page, so any form of indexed scan would at >> least double the number of pages visited, and slow things down. >> >> If you have queries which are not performing to expectations, your >> best bet might be to pick one of them and post it here, following >> the advice on this page: >> >> http://wiki.postgresql.org/wiki/SlowQueryQuestions >> >> -Kevin >> > >
[PERFORM] PostgreSQL-9.0 Monitoring System to improve performance
Hello Everyone, I am implementing a PostgreSQL performance monitoring system (to monitor the below) which would help us understand the database behavior - 1. Big Full Table Scans 2. Table with high IOs (hot tables) 3. Highly used Indexes 4. Tables undergoing high DMLs with index scans 0 (with unused indexes) 5. Index usage for heap blk hits 6. Tracking Checkpoints 7. Tracking CPU, IO and memory usage ( by PG processes ) -- desperately needed 8. Buffer cache usage 9. Tables, Indexes and Database growth statistics and more.. I am struck at building a script or monitoring tool which gets us CPU usage, IO metrics and RAM usage of the database server. Can someone please help me achieve this ? I need to monitor a 12 processor system with 6 cores. I need to know how each CPU is performing. Please help me know the availability of any open source monitoring tools or scripts for PG-9.0 on RHEL5. I will hit back with questions regarding monitoring in coming days. Thanks VB
Re: [PERFORM] : Tracking Full Table Scans
Thanks a lot Kevin !! Yes. I intended to track full table scans first to ensure that only small tables or tables with very less pages are (as you said) getting scanned full. I am yet to identify slow running queries. Will surely hit back with them in future. Thanks VB On Tue, Sep 27, 2011 at 8:02 PM, Kevin Grittner wrote: > Venkat Balaji wrote: > > > I am preparing a plan to track the tables undergoing Full Table > > Scans for most number of times. > > > > If i track seq_scan from the pg_stat_user_tables, will that help > > (considering the latest analyzed ones) ? > > Well, yeah; but be careful not to assume that a sequential scan is > always a bad thing. Here's our top ten tables for sequential scans > in a database which is performing quite well: > > cc=> select seq_scan, n_live_tup, relname > cc-> from pg_stat_user_tables > cc-> order by seq_scan desc > cc-> limit 10; > seq_scan | n_live_tup | relname > --++ > 81264339 | 20 | MaintCode > 16840299 | 3 | DbTranImageStatus > 14905181 | 18 | ControlFeature > 11908114 | 10 | AgingBoundary > 8789288 | 22 | CtofcTypeCode > 7786110 | 6 | PrefCounty > 6303959 | 9 | ProtOrderHistEvent > 5835430 | 1 | ControlRecord > 5466806 | 1 | ControlAccounting > 5202028 | 12 | ProtEventOrderType > (10 rows) > > You'll notice that they are all very small tables. In all cases the > entire heap fits in one page, so any form of indexed scan would at > least double the number of pages visited, and slow things down. > > If you have queries which are not performing to expectations, your > best bet might be to pick one of them and post it here, following > the advice on this page: > > http://wiki.postgresql.org/wiki/SlowQueryQuestions > > -Kevin >
Re: [PERFORM] [PERFORMANCE] Insights: fseek OR read_cluster?
Thank you, Marti, Is there any comprehensive survey of (at least most, if not all) modern features of operating systems, for example I/O scheduling, extent-based filesytems, etc.? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] overzealous sorting?
Le Tue, 27 Sep 2011 19:05:09 +1000, anthony.ship...@symstream.com a écrit : > On Tuesday 27 September 2011 18:54, Marc Cousin wrote: > > The thing is, the optimizer doesn't know if your data will be in > > cache when you will run your query… if you are sure most of your > > data is in the cache most of the time, you could try to tune > > random_page_cost (lower it) to reflect that data is cached. But if > > the win is small on this query, it may not be worth it. > > What I really want is to just read a sequence of records in timestamp > order between two timestamps. The number of records to be read may be > in the millions totalling more than 1GB of data so I'm trying to read > them a slice at a time but I can't get PG to do just this. > > If I use offset and limit to grab a slice of the records from a large > timestamp range then PG will grab all of the records in the range, > sort them on disk and return just the slice I want. This is absurdly > slow. > > The query that I've shown is one of a sequence of queries with the > timestamp range progressing in steps of 1 hour through the timestamp > range. All I want PG to do is find the range in the index, find the > matching records in the table and return them. All of the planner's > cleverness just seems to get in the way. > Maybe you should try using a cursor, if you don't know where you'll stop. This associated with a very low cursor_tuple_fraction will probably give you what you want (a fast start plan). -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] : Tracking Full Table Scans
Venkat Balaji wrote: > I am preparing a plan to track the tables undergoing Full Table > Scans for most number of times. > > If i track seq_scan from the pg_stat_user_tables, will that help > (considering the latest analyzed ones) ? Well, yeah; but be careful not to assume that a sequential scan is always a bad thing. Here's our top ten tables for sequential scans in a database which is performing quite well: cc=> select seq_scan, n_live_tup, relname cc-> from pg_stat_user_tables cc-> order by seq_scan desc cc-> limit 10; seq_scan | n_live_tup | relname --++ 81264339 | 20 | MaintCode 16840299 | 3 | DbTranImageStatus 14905181 | 18 | ControlFeature 11908114 | 10 | AgingBoundary 8789288 | 22 | CtofcTypeCode 7786110 | 6 | PrefCounty 6303959 | 9 | ProtOrderHistEvent 5835430 | 1 | ControlRecord 5466806 | 1 | ControlAccounting 5202028 | 12 | ProtEventOrderType (10 rows) You'll notice that they are all very small tables. In all cases the entire heap fits in one page, so any form of indexed scan would at least double the number of pages visited, and slow things down. If you have queries which are not performing to expectations, your best bet might be to pick one of them and post it here, following the advice on this page: http://wiki.postgresql.org/wiki/SlowQueryQuestions -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Ineffective autovacuum
On 27/09/2011, at 8:29 PM, Marti Raudsepp wrote: > 1. First things first: vacuum cannot delete tuples that are still > visible to any old running transactions. You might have some very long > queries or transactions that prevent it from cleaning properly: > > select * from pg_stat_activity where xact_start < now()-interval '10 minutes'; Thanks -- that query is very handy. I suspect this might be the cause of our woes as this query results in a handful of long lived connections, however they're connections to databases other than the one that I'm having trouble with. I've checked up on the FSM as you suggested, I don't think that's the problem as there're no warnings in the verbose output nor the logs. But another clue: DETAIL: 93 dead row versions cannot be removed yet. After clearing those stuffed transactions vacuum verbose manages to clear away all the dead rows… That's confirmation enough for me - Now to find the application bugs - Thanks Tom, Marti & Scott for your help! --Royce
Re: [PERFORM] Ineffective autovacuum
On Tue, Sep 27, 2011 at 7:49 AM, Tom Lane wrote: > Royce Ausburn writes: >> Since sending this first email I've up'd the autovacuum log level and I've >> noticed that the same tables seem to be auto vacuum'd over and over again… >> Some of the tables are a bit surprising in that they're updated >> semi-regularly, but not enough (I'd think) to warrant an autovacuum every >> few minutes… Is this unusual? > > Well, that proves autovacuum isn't getting blocked anyway. At this > point I suspect that Marti has fingered the correct issue: you likely > need to increase the FSM settings. You should try running a manual > VACUUM VERBOSE and see if it suggests that more FSM space is needed > (there'll be some FSM stats at the end of the verbose printout). That's the probably the best first step, a good second one might be to increase the aggressiveness of autovac by lowering the delay, and increasing the cost limit. OP: You need to watch it a little closer during the day. first do as suggested and increase the max_fsm_pages. High settings on it don't cost a lot as they're only 6 bytes per page. So 1M max_fsm_pages costs 6M of shared RAM. After that run vacuum verbose every hour or two to keep an eye on the trend of how many pages it says are needed. If that number doesn't stabilize, but just keeps growing then you're not vacuuming aggressively enough. Up autovacuum_vacuum_cost_limit by a couple of factors, and lower autovacuum_vacuum_cost_delay to 5ms or less. Make sure you don't swamp your IO subsystem. On big machines with lots of spindles it's hard to swamp the IO. On smaller workstation class machines it's pretty easy. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Ineffective autovacuum
Royce Ausburn writes: > Since sending this first email I've up'd the autovacuum log level and I've > noticed that the same tables seem to be auto vacuum'd over and over again⦠> Some of the tables are a bit surprising in that they're updated > semi-regularly, but not enough (I'd think) to warrant an autovacuum every few > minutes⦠Is this unusual? Well, that proves autovacuum isn't getting blocked anyway. At this point I suspect that Marti has fingered the correct issue: you likely need to increase the FSM settings. You should try running a manual VACUUM VERBOSE and see if it suggests that more FSM space is needed (there'll be some FSM stats at the end of the verbose printout). > Perhaps unrelated: I've done some digging around and happened across a > nightly task doing: > select pg_stat_reset() > on each of the databases in the clusterâ¦. I've no idea why we're doing that > (and our usual sysadmin / DBA has resigned, so I doubt I'll ever know). > There must have been a reason at the time, but I wonder if this might be > interfering with things? Hmm, it's not helping any. Anything that needs vacuuming, but less often than once a day, would get missed due to the stats getting forgotten. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] : Tracking Full Table Scans
Hello Everyone, I am preparing a plan to track the tables undergoing Full Table Scans for most number of times. If i track seq_scan from the pg_stat_user_tables, will that help (considering the latest analyzed ones) ? Please help ! Thanks VB
Re: [PERFORM] : Performance Improvement Strategy
Forgot to mention - Kevin, CLUSTER seems to be an very interesting concept to me. I am thinking to test the CLUSTER TABLE on our production according to the Index usage on the table. Will let you know once i get the results. Regards, VB On Tue, Sep 27, 2011 at 5:59 PM, Venkat Balaji wrote: > We had performed VACUUM FULL on our production and performance has improved > a lot ! > > I started using pg_stattuple and pg_freespacemap for tracking freespace in > the tables and Indexes and is helping us a lot. > > Thanks for all your inputs and help ! > > Regards, > VB > > > On Thu, Sep 22, 2011 at 12:11 AM, Kevin Grittner < > kevin.gritt...@wicourts.gov> wrote: > >> Venkat Balaji wrote: >> >> > If i got it correct, CLUSTER would do the same what VACUUM FULL >> > does (except being fast) >> >> CLUSTER copies the table (in the sequence of the specified index) to >> a new set of files, builds fresh indexes, and then replaces the >> original set of files with the new ones. So you do need room on >> disk for a second copy of the table, but it tends to be much faster >> then VACUUM FULL in PostgreSQL versions before 9.0. (Starting in >> 9.0, VACUUM FULL does the same thing as CLUSTER except that it scans >> the table data rather than using an index.) REINDEX is not needed >> when using CLUSTER or 9.x VACUUM FULL. Older versions of VACUUM >> FULL would tend to bloat indexes, so a REINDEX after VACUUM FULL was >> generally a good idea. >> >> When choosing an index for CLUSTER, pick one on which you often >> search for a *range* of rows, if possible. Like a name column if >> you do a lot of name searches. >> >> -Kevin >> > >
Re: [PERFORM] : Performance Improvement Strategy
We had performed VACUUM FULL on our production and performance has improved a lot ! I started using pg_stattuple and pg_freespacemap for tracking freespace in the tables and Indexes and is helping us a lot. Thanks for all your inputs and help ! Regards, VB On Thu, Sep 22, 2011 at 12:11 AM, Kevin Grittner < kevin.gritt...@wicourts.gov> wrote: > Venkat Balaji wrote: > > > If i got it correct, CLUSTER would do the same what VACUUM FULL > > does (except being fast) > > CLUSTER copies the table (in the sequence of the specified index) to > a new set of files, builds fresh indexes, and then replaces the > original set of files with the new ones. So you do need room on > disk for a second copy of the table, but it tends to be much faster > then VACUUM FULL in PostgreSQL versions before 9.0. (Starting in > 9.0, VACUUM FULL does the same thing as CLUSTER except that it scans > the table data rather than using an index.) REINDEX is not needed > when using CLUSTER or 9.x VACUUM FULL. Older versions of VACUUM > FULL would tend to bloat indexes, so a REINDEX after VACUUM FULL was > generally a good idea. > > When choosing an index for CLUSTER, pick one on which you often > search for a *range* of rows, if possible. Like a name column if > you do a lot of name searches. > > -Kevin >
Re: [PERFORM] Ineffective autovacuum
1. First things first: vacuum cannot delete tuples that are still visible to any old running transactions. You might have some very long queries or transactions that prevent it from cleaning properly: select * from pg_stat_activity where xact_start < now()-interval '10 minutes'; 2. On 8.3 and earlier servers with large tables, it's critical that your max_fsm_pages and max_fsm_relations are tuned properly. Failing that, autovacuum will permanently leak space that can only be fixed with a VACUUM FULL (which will take an exclusive lock and run for a very long time) PostgreSQL version 8.4 addressed this problem, but for the unfortunate, you have to follow the tuning advice here: https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server#autovacuum_max_fsm_pages.2C_max_fsm_relations On Tue, Sep 27, 2011 at 08:08, Royce Ausburn wrote: > I've noticed that the same tables seem to be auto vacuum'd over and over > again… Some of the tables are a bit surprising in that they're updated > semi-regularly, but not enough (I'd think) to warrant an autovacuum every few > minutes… Is this unusual? Maybe they're just auto-analyze processes? Those get triggered on insert-only tables too, when vacuum normally wouldn't run. > Perhaps unrelated: I've done some digging around and happened across a > nightly task doing: > select pg_stat_reset() AFAIK (but I could be wrong), vacuum uses a separate set of statistics not affected by pg_stat_reset. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] overzealous sorting?
On 27/09/11 22:05, anthony.ship...@symstream.com wrote: What I really want is to just read a sequence of records in timestamp order between two timestamps. The number of records to be read may be in the millions totalling more than 1GB of data so I'm trying to read them a slice at a time but I can't get PG to do just this. If I use offset and limit to grab a slice of the records from a large timestamp range then PG will grab all of the records in the range, sort them on disk and return just the slice I want. This is absurdly slow. The query that I've shown is one of a sequence of queries with the timestamp range progressing in steps of 1 hour through the timestamp range. All I want PG to do is find the range in the index, find the matching records in the table and return them. All of the planner's cleverness just seems to get in the way. It is not immediately clear that the planner is making the wrong choices here. Index scans are not always the best choice, it depends heavily on the correlation of the column concerned to the physical order of the table's heap file. I suspect the reason for the planner choosing the bitmap scan is that said correlation is low (consult pg_stats to see). Now if you think that the table's heap data is cached anyway, then this is not such an issue - but you have to tell the planner that by reducing random_page_cost (as advised previously). Give it a try and report back! regards Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] overzealous sorting?
On Tuesday 27 September 2011 18:54, Marc Cousin wrote: > The thing is, the optimizer doesn't know if your data will be in cache > when you will run your query… if you are sure most of your data is in > the cache most of the time, you could try to tune random_page_cost > (lower it) to reflect that data is cached. But if the win is small on > this query, it may not be worth it. What I really want is to just read a sequence of records in timestamp order between two timestamps. The number of records to be read may be in the millions totalling more than 1GB of data so I'm trying to read them a slice at a time but I can't get PG to do just this. If I use offset and limit to grab a slice of the records from a large timestamp range then PG will grab all of the records in the range, sort them on disk and return just the slice I want. This is absurdly slow. The query that I've shown is one of a sequence of queries with the timestamp range progressing in steps of 1 hour through the timestamp range. All I want PG to do is find the range in the index, find the matching records in the table and return them. All of the planner's cleverness just seems to get in the way. -- Anthony Shipman | Consider the set of blacklists that anthony.ship...@symstream.com | do not blacklist themselves... -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] overzealous sorting?
Le Tue, 27 Sep 2011 12:45:00 +1000, anthony.ship...@symstream.com a écrit : > On Monday 26 September 2011 19:39, Marc Cousin wrote: > > Because Index Scans are sorted, not Bitmap Index Scans, which > > builds a list of pages to visit, to be then visited by the Bitmap > > Heap Scan step. > > > > Marc. > > Where does this bitmap index scan come from? It seems to negate the > advantages of b-tree indexes described in the section "Indexes and > ORDER BY" of the manual. If I do "set enable_bitmapscan = off;" the > query runs a bit faster although with a larger time range it reverts > to a sequential scan. > Bitmap Index Scan is just another way to use a btree index. It is often used when a bigger part of a table is required, as it costs more than plain index scan to retrieve a few records, but less when a lot of records are needed. Your tests show that index scans are a bit faster on this query. But it is probably true only when most needed data is cached, which is probably your case, as you are doing tests using the same query all the time. The bitmap index scan is probably cheaper when data isn't in cache. You could also see the bitmap index scan as safer, as it won't perform as bad when data is not cached (less random IO) :) The thing is, the optimizer doesn't know if your data will be in cache when you will run your query… if you are sure most of your data is in the cache most of the time, you could try to tune random_page_cost (lower it) to reflect that data is cached. But if the win is small on this query, it may not be worth it. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance