Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
On Sep 18, 2007, at 1:14 AM, Joshua D. Drake wrote: Q: To get rid of index bloat, is a VACUUM ANALYZE enough? Or must I reindex/cluster indexes? If you overrun your max_fsm_pages, no: else yes; my algorithm is: if (true) then yes; my FSM is way bigger than I ever use (vacuum never reports shortage) and I still get bloat that needs to be purged out with a reindex on occasion. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
On 24/09/2007, Vivek Khera [EMAIL PROTECTED] wrote: my FSM is way bigger than I ever use (vacuum never reports shortage) and I still get bloat that needs to be purged out with a reindex on occasion. Vivek, I feel your pain. But I seem to have (mostly) solved my problem in three ways: 1. Increase the shared_buffer and effective_cache_size settings in postgresql.conf. There are some websites that suggest that increasing shared_buffer beyond 40,000 may in fact have counter-intuitive results, but not in my case. I'm at 60,000 and it seems to work well. Effective_cache_size is 512000. 2. Reduce the fill factor on your table. This is the single most performance boost. On a table that is frequently updated on a TEXT column, I reduced it to 60 and have never looked back. For others, I'm experimenting with 80 and it seems to be working well. 3. Make your autovacuum settings as aggressive as can be. Basically I found that doing a cronjob of vacuuming every five hours worked really well, which suggested that autovacuum was not really kicking in as often it was needed. So I reduced the threshold (100 for vacuum, 80 for analyze...i.e., the number of tuples that get updated before either process kicks in) and reduced quite aggressively the scores. Here are my settings: autovacuum = on autovacuum_vacuum_cost_delay = 10 vacuum_cost_delay= 10 autovacuum_naptime = 10 autovacuum_vacuum_threshold = 75 autovacuum_analyze_threshold = 25 autovacuum_analyze_scale_factor = 0.02 autovacuum_vacuum_scale_factor = 0.01 Hope this helps some. PK. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
On Sep 24, 2007, at 12:00 PM, Phoenix Kiula wrote: I feel your pain. But I seem to have (mostly) solved my problem in three ways: My particular usage pattern (add data continuously, purge *some* of the data once per week or every other week. The purge is what seems to kill it. Last time I reindexed, I reclaimed over 20Gb of disk space. That was after 6 months from the prior reindex. Recommending I run vacuum intermixed with the data purge is a non- starter; the vacuum on these tables takes a couple of hours. I'd never finish purging my data with that kind of delay. Recommending splitting my tables with inheritance is not going to work since my purges don't happen across any lines which would make sense for splitting with inheritance. I will investigate the fill-factor. That seems like it may make some sense the way I do inserts and updates... ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
Vivek Khera [EMAIL PROTECTED] writes: On Sep 24, 2007, at 12:00 PM, Phoenix Kiula wrote: I feel your pain. But I seem to have (mostly) solved my problem in three ways: My particular usage pattern (add data continuously, purge *some* of the data once per week or every other week. The purge is what seems to kill it. Last time I reindexed, I reclaimed over 20Gb of disk space. That was after 6 months from the prior reindex. Do you have a pattern of loading a ton of data covering a range of indexed key values and then deleting all but a few values spread evenly throughout that range? And then never inserting new key values in those ranges again? For example loading records indexed by timestamp and then deleting all but the first record of the day. That kind of pattern does need a regular reindex because the index pages will have those few values left on them preventing them from being reused. If that's not your usage pattern then perhaps you should describe your usage pattern in more detail. But I suspect you would be best served by simply vacuuming much more often. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
On 25/09/2007, Vivek Khera [EMAIL PROTECTED] wrote: Recommending I run vacuum intermixed with the data purge is a non- starter; the vacuum on these tables takes a couple of hours. I'd never finish purging my data with that kind of delay. ... I will investigate the fill-factor. That seems like it may make some sense the way I do inserts and updates... Undoubtedly. But if most of your indexed keys are gone, then a reindex is useful. If this is a hugely live system and you don't have a great number of indexes, then a somewhat kludgish way to try could be to create a copy of the table, do what you wish with it (delete rows, index them, then cluster them on that index)...and whenever the process finishes (3 hours, or 3 days...no matter, because it doesn't hurt your live system), you simply rename the old table to TABLE_OLD and the new table to TABLE. The renaming operation is instant. Anyway, what is your maintenance_work_mem? Try increasing your maintenance_work_mem and see if that helps vacuuming first. Vacuum operations can be sped up dramatically. We need regular vacuums and that is critical to our application, so I have a m_w_m of 512K. Mind you -- even if your DB vacuums for a couple hours, vacuum doesn't affect the performance of your live system while it is happening, so frequent vacuuming cannot hurt you one way or another, and it can surely help. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
On 19/09/2007, Gregory Williamson [EMAIL PROTECTED] wrote: ... Can't speak directly to PostgreSQL but in Informix the fill factor is useful for tweaking indexes. A very high fill factor is useful for tables that are static -- any inserts or changes to the index trigger a *lot* of moving of b-tree branches. But the high fill factor means that each page has more useful data references in it. A very low fill factor means that pages are sparse and so inserts and updates are less likely to trigger massive b-tree rebalancings. Assuming pgsql's fill factor is similar to Informix' (yes, a highly suspect assumption), could we say: 1. A small fill factor such as 10 or 20 would be good for the index size and will not trigger massive btree rebalancings? (I'm first playing with a value of 60 for now and seeing how it works out...seems ok at the moment!) 2. Is this fill factor enough to have on the table, or should I also do a fill factor for specific indexes? Or both the table and the index? (I have four btree indexes on the table) Thanks ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
On 19/09/2007, Richard Broersma Jr [EMAIL PROTECTED] wrote: --- Phoenix Kiula [EMAIL PROTECTED] wrote: 2. Is this fill factor enough to have on the table, or should I also do a fill factor for specific indexes? Or both the table and the index? (I have four btree indexes on the table) I don't think that fill factor can be applied to the table. The CREATE TABLE reference doc show that fill factor can be used in the CREATE TABLE statement, but it is only applied to syntax that creates an implied index. i.e. CREATE TABLE test ( test_idINTEGER PRIMARY KEY WITH ( FILLFACTOR = 70 ), test_val TEXT ); Primary key will create an implied index. Fill factor is applied to that implied index. Regards, Richard Broersma Jr. Then I am confused again about how the fill factor stuff works. Let's say I have a table with four BTREE indexes. Should all of them have a fill factor of about 60 (lower than the default 90, that is) to be effective? Or will it help if I lower the fill factor on only a couple of the most often used ones? The primary key index is very, very rarely updated so I don't need it to have a fill factor. I could try and see these one by one, but that's no better than touching/feeling the database blind-folded. I would love to see some writeup about this whole thing, but it seems hard to come by! Many thanks ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
--- Phoenix Kiula [EMAIL PROTECTED] wrote: Then I am confused again about how the fill factor stuff works. Let's say I have a table with four BTREE indexes. Should all of them have a fill factor of about 60 (lower than the default 90, that is) to be effective? Or will it help if I lower the fill factor on only a couple of the most often used ones? The primary key index is very, very rarely updated so I don't need it to have a fill factor. I could try and see these one by one, but that's no better than touching/feeling the database blind-folded. I would love to see some writeup about this whole thing, but it seems hard to come by! I will try to explain everything that I understand about indexes and tables. I am sure that if some part of my understanding is incorrect, someone will chime in to correct me. In PostgreSQL, tables are physically distinct from indexes. This means that any give table is written to disk as a file(s), and indexes are also written to disk as a separate file(s). A table and index are both saved to disk in segmented block referred to a pages (I believe the default size is 8K). The advantage of the index file is that it is significantly smaller in size, so it takes less time to sequentially scan and less time to read from disk. Now when you want to find a record, PostgreSQL will/(may choose to) sequentially scan the index until it find the record entry that corresponds with your criteria. This entry has a table cross-reference to the actual page that contains the record that is pointed at by the index. Lastly, the entire table page containing your record is read from disk to memory for further query processing. ASSUMPTION When you insert a record into a table that generates an entry into the b-tree index file, PostgreSQL will scan the pages of the index file to find the correct place and index page to add this entry. If the page is already full, PostgreSQL probably replaces the old full pages with two new pages with a distribution of that chunk of the B-tree index, and then adds the new entry to one of those pages. This operation can become very expensive if many new pages need to be created from single INSERT/UPDATE statement. /ASSUMPTION By using fill factor, you are telling PostgreSQL to automatically leave a portion of any newly created index page partially blank for future use. When a newly created index entry needs to be saved, it can be stored in one of the holes left in the index page. A large fill factor create both advantages and dis-advantages. For writing operations, it is a big advantage because, a large fill factor will leave alot of holes for INSERT and UPDATE operations to use. This can help increase the number of UPDATE/INSERT per seconds that you server can handle since, they index pages do not have to be rebuilt very often. However, the disadvantage is that, a newly created index with a large fill factor has index bloat designed into it. This mean that the index pages have a large portion of holes. So PostgreSQL will create more index pages than it would normally in order to hold both your index and the pre-defined holes that you specified by your fill-factor. Larger indexes require more time to be read from disk to memory and will require more time to sequentially scan to find to find the cross-reference table page location of records of interest. So the net effect is that larger indexes will make SELECT statement slower. This is my understanding for tables indexes and fill factor. I hope it helps. Regards, Richard Broersma Jr. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
Phoenix Kiula wrote: Then I am confused again about how the fill factor stuff works. Let's say I have a table with four BTREE indexes. Should all of them have a fill factor of about 60 (lower than the default 90, that is) to be effective? Or will it help if I lower the fill factor on only a couple of the most often used ones? The primary key index is very, very rarely updated so I don't need it to have a fill factor. I could try and see these one by one, but that's no better than touching/feeling the database blind-folded. I would love to see some writeup about this whole thing, but it seems hard to come by! Let me present my understanding; I hope I won't confuse issues further. Indexes and tables are bath organized in pages, each page contains several entries or rows. When an INSERT or UPDATE on the table occurs, a new row (version) is created (and the old version of the row will be freed upon VACUUM). Any index entry that points to this row will have to be changed because the location of the row has changed. That means that there will also have to be a new entry in the index, even if the key has not changed (simply modifying the existing index entry to point to the new row location won't do, because there may be transactions that still need the old version of the row). Reducing fillfactor on tables (default 100) will reduce the number of table pages that need to be touched during an UPDATE. Reducing fillfactor on a B-tree index (default 90) will reduce the frequency of page splits that can happen upon INSERT or UPDATE. Both at the cost of wasting some disk (and memory) space. So I *guess* that when you decide that a table will be heavily updated and you want to reduce disk I/O at the cost of wasting some space, it will be a good idea to reduce fillfactor on the table and all its indexes. I emphasize the guess because a) I may have made a mistake in my deductions :^) and b) I cannot tell you good numbers to choose. As in most performance tuning questions, the best thing you can probably is to test and compare various settings and see which performs best for you Yours, Laurenz Albe ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
On Sep 19, 2007, at 9:29 AM, Richard Broersma Jr wrote: --- Phoenix Kiula [EMAIL PROTECTED] wrote: Then I am confused again about how the fill factor stuff works. Let's say I have a table with four BTREE indexes. Should all of them have a fill factor of about 60 (lower than the default 90, that is) to be effective? Or will it help if I lower the fill factor on only a couple of the most often used ones? The primary key index is very, very rarely updated so I don't need it to have a fill factor. I could try and see these one by one, but that's no better than touching/feeling the database blind-folded. I would love to see some writeup about this whole thing, but it seems hard to come by! I will try to explain everything that I understand about indexes and tables. I am sure that if some part of my understanding is incorrect, someone will chime in to correct me. In PostgreSQL, tables are physically distinct from indexes. This means that any give table is written to disk as a file(s), and indexes are also written to disk as a separate file(s). A table and index are both saved to disk in segmented block referred to a pages (I believe the default size is 8K). Yes, that can be changed at compile time, although I don't think I've ever heard of any advantages to doing that. The advantage of the index file is that it is significantly smaller in size, so it takes less time to sequentially scan and less time to read from disk. Now when you want to find a record, PostgreSQL will/(may choose to) sequentially scan the index until it find the record entry that corresponds with your criteria. This entry has a table cross-reference to the actual page that contains the record that is pointed at by the index. Lastly, the entire table page containing your record is read from disk to memory for further query processing. ASSUMPTION When you insert a record into a table that generates an entry into the b-tree index file, PostgreSQL will scan the pages of the index file to find the correct place and index page to add this entry. If the page is already full, PostgreSQL probably replaces the old full pages with two new pages with a distribution of that chunk of the B-tree index, and then adds the new entry to one of those pages. This operation can become very expensive if many new pages need to be created from single INSERT/UPDATE statement. /ASSUMPTION Yes, the point of B-trees is that they have fast lookup times, but updates can be expensive when you have to re-balance your leaf nodes. By using fill factor, you are telling PostgreSQL to automatically leave a portion of any newly created index page partially blank for future use. When a newly created index entry needs to be saved, it can be stored in one of the holes left in the index page. That future use is only for updates. A large fill factor create both advantages and dis-advantages. For writing operations, it is a big advantage because, a large fill factor will leave alot of holes for INSERT and UPDATE operations to use. This can help increase the number of UPDATE/ INSERT per seconds that you server can handle since, they index pages do not have to be rebuilt very often. Not quite. Once a page has reached it's fill factor percentage full, no more inserts will happen on that page, only updates. Also, I think you have large/small backwards wrt fill factor. If you have a fill factor of, say, 40% then once a page has reached 40% full no more inserts will happen (unless some space is reclaimed by vacuum). So, smaller fill factors == bigger holes. The bigger the fill factor, the smaller the whole: if you have a fill factor of 90%, only 10% is reserved for updates of rows on that page. However, the disadvantage is that, a newly created index with a large fill factor has index bloat designed into it. This mean that the index pages have a large portion of holes. So PostgreSQL will create more index pages than it would normally in order to hold both your index and the pre-defined holes that you specified by your fill-factor. Larger indexes require more time to be read from disk to memory and will require more time to sequentially scan to find to find the cross-reference table page location of records of interest. So the net effect is that larger indexes will make SELECT statement slower. This is my understanding for tables indexes and fill factor. I hope it helps. Again, with the large v. small fill factor point. Using fill factor seems to be a trade-off between space and update efficiency. Let's say that after so many (potential) updates you know that each row will become static, i.e. no more updates will happen. Let's use some numbers and make them easy to work with. Say each row will be updated exactly once and you use a fill factor of 50%. Now, say 5K of fresh index data is written. The first 4K will go into one page at which
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
Not quite. Once a page has reached it's fill factor percentage full, no more inserts will happen on that page, only updates. Also, I think you have large/small backwards wrt fill factor. If you have a fill factor of, say, 40% then once a page has reached 40% full no more inserts will happen (unless some space is reclaimed by vacuum). So, smaller fill factors == bigger holes. The bigger the fill factor, the smaller the whole: if you have a fill factor of 90%, only 10% is reserved for updates of rows on that page. So (just to reiterate), fill factor can be applied to both a table and/or an index(es). But the holes built into the page of a table or index can only be filled by UPDATE Statements. Thanks for the clarification! Regards, Richard Broesma Jr. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
--- Phoenix Kiula [EMAIL PROTECTED] wrote: 2. Is this fill factor enough to have on the table, or should I also do a fill factor for specific indexes? Or both the table and the index? (I have four btree indexes on the table) I don't think that fill factor can be applied to the table. The CREATE TABLE reference doc show that fill factor can be used in the CREATE TABLE statement, but it is only applied to syntax that creates an implied index. i.e. CREATE TABLE test ( test_idINTEGER PRIMARY KEY WITH ( FILLFACTOR = 70 ), test_val TEXT ); Primary key will create an implied index. Fill factor is applied to that implied index. Regards, Richard Broersma Jr. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
On Sep 19, 2007, at 10:30 AM, Richard Broersma Jr wrote: Not quite. Once a page has reached it's fill factor percentage full, no more inserts will happen on that page, only updates. Also, I think you have large/small backwards wrt fill factor. If you have a fill factor of, say, 40% then once a page has reached 40% full no more inserts will happen (unless some space is reclaimed by vacuum). So, smaller fill factors == bigger holes. The bigger the fill factor, the smaller the whole: if you have a fill factor of 90%, only 10% is reserved for updates of rows on that page. So (just to reiterate), fill factor can be applied to both a table and/or an index(es). But the holes built into the page of a table or index can only be filled by UPDATE Statements. Thanks for the clarification! Yep. Although, to be strictly honest, I guess the term UPDATE isn't the best term to use for indexes. My description works best for tables, see the section on FILLFACTOR in http://www.postgresql.org/ docs/8.2/interactive/sql-createindex.html for a better description of what happens for indexes -- slightly different semantics, but the same general effect. Also, note that once we have HOT, figuring out fill factor for indexes will be a whole different ball game. Currently, an update to any tuple in a table, results in a new index entry. With hot, index entries will only happen if the indexed column is changed in the update. Erik Jones Software Developer | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
--- Erik Jones [EMAIL PROTECTED] wrote: Also, note that once we have HOT... I am not sure what the acronym HOT stands for. Does it have something to do with MVCC? Regards, Richard Broersma Jr. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
On Sep 19, 2007, at 11:00 AM, Richard Broersma Jr wrote: --- Erik Jones [EMAIL PROTECTED] wrote: Also, note that once we have HOT... I am not sure what the acronym HOT stands for. Does it have something to do with MVCC? Heap Only Tuple. Here's a link to the (latest?) readme for it: http://archives.postgresql.org/pgsql-patches/2007-09/msg00261.php Erik Jones Software Developer | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Ow Mun Heng wrote: On Mon, 2007-09-17 at 22:14 -0700, Joshua D. Drake wrote: Phoenix Kiula wrote: So a YES/NO question: Q: To get rid of index bloat, is a VACUUM ANALYZE enough? Or must I reindex/cluster indexes? If you overrun your max_fsm_pages, no: else yes; Maybe my english suck, but I don't understand the above answer. If I overrun my Max_FSM_pages then a vacuum analyse is enough to return it back to normal. If you overrun your max_fsm_pages then vacuum analyze IS NOT ENOUGH. If you do not overrun your max_fsm_pages, yes vacuum analyze can deal with the issue. Joshua D. Drake If I _didn't_ overrun my fsm, then a reindex/cluster is necessary. Did I get that right? (I feel it's wrong and a reindex/cluster is needed only when I overrun my max_fsm) - -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFG72kwATb/zqfZUUQRAqNMAJsFjqWirgGF+VlEIwaVDnxBAefeSwCfesD1 osqiudjcEY/tyibvNZRJ/UU= =apjz -END PGP SIGNATURE- ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
2007/9/18, Joshua D. Drake [EMAIL PROTECTED]: If you overrun your max_fsm_pages then vacuum analyze IS NOT ENOUGH. If you do not overrun your max_fsm_pages, yes vacuum analyze can deal with the issue. Are you sure? I have a situation where above is no true. postgres version 8.1.8. while vacuum verbose says: INFO: free space map contains 2329221 pages in 490 relations DETAIL: A total of 2345744 page slots are in use (including overhead). 2345744 page slots are required to track all free space. Current limits are: 1000 page slots, 1000 relations, using 58698 KB. ... and we have constant problem with index bloat and need to REINDEX frequently. the database is very redundant and has quite hight data retention rate (it's an ecommerce site) -- Filip Rembiałkowski ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
On Tue, 2007-09-18 at 06:01 -0400, Bill Moran wrote: * (with newer version) reduce the fill factor and REINDEX What is fill factor? ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
Filip Rembiałkowski [EMAIL PROTECTED] wrote: 2007/9/18, Joshua D. Drake [EMAIL PROTECTED]: If you overrun your max_fsm_pages then vacuum analyze IS NOT ENOUGH. If you do not overrun your max_fsm_pages, yes vacuum analyze can deal with the issue. Are you sure? I have a situation where above is no true. postgres version 8.1.8. while vacuum verbose says: INFO: free space map contains 2329221 pages in 490 relations DETAIL: A total of 2345744 page slots are in use (including overhead). 2345744 page slots are required to track all free space. Current limits are: 1000 page slots, 1000 relations, using 58698 KB. ... and we have constant problem with index bloat and need to REINDEX frequently. the database is very redundant and has quite hight data retention rate (it's an ecommerce site) I've been involved in a number of the discussions on this, and I think part of the confusion stems from the fact that index bloat is an ambiguous term. If the index gets large enough that it no longer fits in shared memory, and reindexing it will reduce its size to where it _will_ fit in shared memory, then the index _could_ be said to be bloated. However, an equally valid solution to that problem is to increase the amount of shared memory available (possibly by adding RAM). Unfortunately, folks like Phoenix are looking for yes/no answers, and with many of these questions, the _correct_ answer is it depends on your workload If you find that reindexing improves performance, then you should investigate further. Depending on the exact nature of the problem, there are many possible solutions, three that come to mind: * Add RAM/SHM * REINDEX on a regular schedule * (with newer version) reduce the fill factor and REINDEX -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
On 9/18/07, Ow Mun Heng [EMAIL PROTECTED] wrote: On Tue, 2007-09-18 at 06:01 -0400, Bill Moran wrote: * (with newer version) reduce the fill factor and REINDEX What is fill factor? See Index Storage Parameters: http://www.postgresql.org/docs/8.2/static/sql-createindex.html ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
Hi, Can I add SHM with merely by managing the entry in sysctl.conf? My current values: kernel.shmmax = 536870912 kernel.shmall = 536870912 My shared_buffers in postgresql.conf is 2. From the website http://www.desknow.com/kb/idx/12/061/article/ I notice that shmmax should be sharedbuffer*8192, so I suppose my shmmax can be much lower than the above, but I raised it for performance. Am I wrong to do so? You need to configure the kernel so it allows processes to use more shared memory. This does not mean that a process automatically uses it. For PostgreSQL you will need to increase shared_buffers to make it use the extra available shared memory. With your shared memory settings you can probably increase shared_buffers to about 65000. With the 'ipcs' command you can see how much shared memory PostgreSQL uses. Look under 'Shared Memory Segments' to memory owned by user postgres. - Sander ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
Unfortunately, folks like Phoenix are looking for yes/no answers, and with many of these questions, the _correct_ answer is it depends on your workload I wanted merely to simplify the advice that gets dispensed on this list, often conflicting to novice ears like mine. So I appreciate your notes very much. If you find that reindexing improves performance, then you should investigate further. Depending on the exact nature of the problem, there are many possible solutions, three that come to mind: * Add RAM/SHM Can I add SHM with merely by managing the entry in sysctl.conf? My current values: kernel.shmmax = 536870912 kernel.shmall = 536870912 My shared_buffers in postgresql.conf is 2. From the website http://www.desknow.com/kb/idx/12/061/article/ I notice that shmmax should be sharedbuffer*8192, so I suppose my shmmax can be much lower than the above, but I raised it for performance. Am I wrong to do so? * REINDEX on a regular schedule This is sadly not really feasible, because we need to offer a 100% availability website. REINDEX does not work concurrently so it is not really an option for us. My max_fsm_pages and max_fsm_relations are way above the numbers that come up after the VACUUM ANALYZE VERBOSE run. But still, the autovacuum stuff seems like it is not working at all. Some related entries in the conf file: autovacuum = on autovacuum_vacuum_cost_delay = 20 vacuum_cost_delay= 20 autovacuum_naptime = 30 stats_start_collector= on stats_row_level = on autovacuum_vacuum_threshold = 80 autovacuum_analyze_threshold = 80 And yet, the db often slows down, at which point I manually login and run a manual VACUUM ANALYZE and it seems fine for some more time. Sometimes, I also restart pgsql and that seems to help for a while. Another advice on these forums is to see vmstat 1, without actually specifying how to draw inferences from it. The free version of it is coming up at decent rates, as follows: procs ---memory-- ---swap-- -io --system-- cpu r b swpd free buff cache si sobibo incs us sy id wa 1 0 29124 110760 108980 346773601 206 1400 4 2 1 85 12 0 0 29124 110632 108980 346773600 0 0 1052 108 0 0 100 0 2 0 29124 108840 108980 346773600 0 0 1112 299 1 1 98 0 1 0 29124 109288 108980 346773600 0 0 1073 319 2 1 98 0 . * (with newer version) reduce the fill factor and REINDEX I think some of my tables are updated very frequently so a smaller fill factor will be nice. How can I find the current fill factor on my tables? Also, is there some method or science to calculating a decent fill factor -- size of table, number of indexes, frequency of updates, and such? We have one major table which faces a lot of INSERTs and UPDATES in a day (up to 10,000) but many many more SELECTs (up to 10 million). Thanks. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
Hi, Now, I can merrily increase the shared_buffers, but the manual warns me against increasing the value too much because it is per transaction value. Shared_buffers is not per-transaction. Where did you find this information? - Sander ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
On 18/09/2007, Sander Steffann [EMAIL PROTECTED] wrote: Hi, Can I add SHM with merely by managing the entry in sysctl.conf? My current values: kernel.shmmax = 536870912 kernel.shmall = 536870912 My shared_buffers in postgresql.conf is 2. From the website http://www.desknow.com/kb/idx/12/061/article/ I notice that shmmax should be sharedbuffer*8192, so I suppose my shmmax can be much lower than the above, but I raised it for performance. Am I wrong to do so? You need to configure the kernel so it allows processes to use more shared memory. This does not mean that a process automatically uses it. For PostgreSQL you will need to increase shared_buffers to make it use the extra available shared memory. With your shared memory settings you can probably increase shared_buffers to about 65000. Thanks, the IPCS command shows me this: -- Shared Memory Segments keyshmid owner perms bytes nattch status 0x0052e2c1 6782976postgres 600176668672 2 Now, I can merrily increase the shared_buffers, but the manual warns me against increasing the value too much because it is per transaction value. So here's the conflict for a novice like me: 1. Do not increase shared_buffer too much because it is per-transaction. 2. Do increase the SHM for performance, but it is only useful if you also increase shared_buffer. So which is it? Would it help to increase the effective_cache_size? It is currently at 512000. I have 4GB ram on the machine, but am willing to devote about 2GB to pgsql. Thanks! ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
In response to Phoenix Kiula [EMAIL PROTECTED]: If you find that reindexing improves performance, then you should investigate further. Depending on the exact nature of the problem, there are many possible solutions, three that come to mind: * Add RAM/SHM Can I add SHM with merely by managing the entry in sysctl.conf? My current values: kernel.shmmax = 536870912 kernel.shmall = 536870912 These values define the max allowed. They exist to keep poorly written applications from sucking up all the available memory. Setting them higher than is needed does not cause any problems, unless a greedy or poorly-written application grabs all that memory. My shared_buffers in postgresql.conf is 2. From the website http://www.desknow.com/kb/idx/12/061/article/ I notice that shmmax should be sharedbuffer*8192, so I suppose my shmmax can be much lower than the above, but I raised it for performance. Am I wrong to do so? It's completely impossible to tell without knowing more about your physical hardware. The rule of thumb is 1/3 physical RAM to start, then adjust if more or less seems to help. That advice is for versions of PG = 8. If you're still running a 7.X version, upgrade. How much RAM does this system have in it? Unless you have other applications running on this system using RAM, you should allocate more of it to shared_buffers. If 160M is 1/3 your RAM, you probably need to add more RAM. How big is your database? If it's possible to fit it all in shared_buffers, that will give you the best performance. * REINDEX on a regular schedule This is sadly not really feasible, because we need to offer a 100% availability website. REINDEX does not work concurrently so it is not really an option for us. My max_fsm_pages and max_fsm_relations are way above the numbers that come up after the VACUUM ANALYZE VERBOSE run. Hence my comment about depending on your workload and investigating the situation to determine the best solution. But still, the autovacuum stuff seems like it is not working at all. Some related entries in the conf file: autovacuum = on autovacuum_vacuum_cost_delay = 20 vacuum_cost_delay= 20 autovacuum_naptime = 30 stats_start_collector= on stats_row_level = on autovacuum_vacuum_threshold = 80 autovacuum_analyze_threshold = 80 And yet, the db often slows down, at which point I manually login and run a manual VACUUM ANALYZE and it seems fine for some more time. Sometimes, I also restart pgsql and that seems to help for a while. You don't mention *_scale_factor settings. Those are going to be important as well. Based on your symptoms, it sounds like autovacuum is not getting those tables vacuumed enough. I recommend raising the debug level and watching the logs to see if autovacuum is actually getting tables vacuumed. Consider lowering your *_scale_factor values if not. Or even reducing autovacuum_naptime. Another advice on these forums is to see vmstat 1, without actually specifying how to draw inferences from it. The free version of it is coming up at decent rates, as follows: procs ---memory-- ---swap-- -io --system-- cpu r b swpd free buff cache si sobibo incs us sy id wa 1 0 29124 110760 108980 346773601 206 1400 4 2 1 85 12 0 0 29124 110632 108980 346773600 0 0 1052 108 0 0 100 0 2 0 29124 108840 108980 346773600 0 0 1112 299 1 1 98 0 1 0 29124 109288 108980 346773600 0 0 1073 319 2 1 98 0 . Explaining how to interpret the output of this command and determine what to do with it is not something easily done in a short paragraph. However, it looks like you've got a lot of RAM being used for the disk cache. That memory would probably be better used as shared_buffers, so I suggest you increase that value considerably. * (with newer version) reduce the fill factor and REINDEX I think some of my tables are updated very frequently so a smaller fill factor will be nice. How can I find the current fill factor on my tables? Also, is there some method or science to calculating a decent fill factor -- size of table, number of indexes, frequency of updates, and such? We have one major table which faces a lot of INSERTs and UPDATES in a day (up to 10,000) but many many more SELECTs (up to 10 million). I'm not sure how to find the current value, but a smaller fill factor on busy tables should lead to less fragmentation, thus more efficient indexes over time. Keep in mind that a smaller fill factor will also lead to larger indexes initially. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
index fillfactor (was Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER)
In response to Phoenix Kiula [EMAIL PROTECTED]: Thanks for a very informative post! One question: I'm not sure how to find the current value, but a smaller fill factor on busy tables should lead to less fragmentation, thus more efficient indexes over time. Keep in mind that a smaller fill factor will also lead to larger indexes initially. What constitutes a small fill factor? Would 70 be good? Unfortunately, I can't say. I have not yet had the opportunity to experiment with different fillfactors, so I can only speak in vague estimations on this topic. I guess my current must have been the default, which the manual says is 100. I expect it's at the default, but the docs say that is 90%: http://www.postgresql.org/docs/8.2/static/sql-createindex.html Where did you see 100? Or did you mean really small fill factor like 20? In this context, what is packing in the manual -- is that some kind of compression? Hopefully, someone more knowledgeable will chime in with some wise suggestions. Barring that, I can only suggest you experiment to find what works for your workload, but don't rule out the possibility that extremely low fillfactor values might work well for you. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
--- Phoenix Kiula [EMAIL PROTECTED] wrote: What constitutes a small fill factor? Would 70 be good? I guess my current must have been the default, which the manual says is 100. On the following link: http://www.postgresql.org/docs/8.2/interactive/sql-createindex.html#SQL-CREATEINDEX-STORAGE-PARAMETERS I found this: B-trees use a default fillfactor of 90, but any value from 10 to 100 can be selected. Regards, Richard Broersma Jr. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
Thanks for a very informative post! One question: I'm not sure how to find the current value, but a smaller fill factor on busy tables should lead to less fragmentation, thus more efficient indexes over time. Keep in mind that a smaller fill factor will also lead to larger indexes initially. What constitutes a small fill factor? Would 70 be good? I guess my current must have been the default, which the manual says is 100. Or did you mean really small fill factor like 20? In this context, what is packing in the manual -- is that some kind of compression? ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
--- Gregory Williamson [EMAIL PROTECTED] wrote: A very low fill factor means that pages are sparse and so inserts and updates are less likely to trigger massive b-tree rebalancings. I take it that massive b-tree rebalancings could cause a problem with the performance of disk writing though-put from UPDATEs and INSERTs? Regards, Richard Broersma Jr. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
Richard Broersma Jr wrote: --- Gregory Williamson [EMAIL PROTECTED] wrote: A very low fill factor means that pages are sparse and so inserts and updates are less likely to trigger massive b-tree rebalancings. I take it that massive b-tree rebalancings could cause a problem with the performance of disk writing though-put from UPDATEs and INSERTs? Regards, Richard Broersma Jr. Precisely -- even if it can keep everything in RAM it can occupy quite a few cycles to rebalance a large b-tree. And eventually those changes do need to get written to disk so the next checkpoint (I think) will also have more work. G ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
Sorry for top-posting -- challenged reader. Can't speak directly to PostgreSQL but in Informix the fill factor is useful for tweaking indexes. A very high fill factor is useful for tables that are static -- any inserts or changes to the index trigger a *lot* of moving of b-tree branches. But the high fill factor means that each page has more useful data references in it. A very low fill factor means that pages are sparse and so inserts and updates are less likely to trigger massive b-tree rebalancings. I've never used it on PostgreSQL (yet!) but am looking forward to it. Beware of premature optimization! HTH, Greg Williamson Senior DBA GlobeXplorer LLC, a DigitalGlobe company Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information and must be protected in accordance with those provisions. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. (My corporate masters made me say this.) -Original Message- From: [EMAIL PROTECTED] on behalf of Richard Broersma Jr Sent: Tue 9/18/2007 10:29 AM To: Phoenix Kiula; Bill Moran Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER --- Phoenix Kiula [EMAIL PROTECTED] wrote: What constitutes a small fill factor? Would 70 be good? I guess my current must have been the default, which the manual says is 100. On the following link: http://www.postgresql.org/docs/8.2/interactive/sql-createindex.html#SQL-CREATEINDEX-STORAGE-PARAMETERS I found this: B-trees use a default fillfactor of 90, but any value from 10 to 100 can be selected. Regards, Richard Broersma Jr. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Phoenix Kiula wrote: The manual is vague. Several threads about this, in language that is ambiguous to me. So a YES/NO question: Q: To get rid of index bloat, is a VACUUM ANALYZE enough? Or must I reindex/cluster indexes? If you overrun your max_fsm_pages, no: else yes; Sincerely, Joshua D. Drake Thanks. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq - -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFG717QATb/zqfZUUQRAh6uAJ9CGXbA2BxXvMbSZP9Gv8gI9QBkXgCePhqe 6aS3fp60g7YrWECspTVcxyE= =u2o/ -END PGP SIGNATURE- ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] For index bloat: VACUUM ANALYZE vs REINDEX/CLUSTER
On Mon, 2007-09-17 at 22:14 -0700, Joshua D. Drake wrote: Phoenix Kiula wrote: So a YES/NO question: Q: To get rid of index bloat, is a VACUUM ANALYZE enough? Or must I reindex/cluster indexes? If you overrun your max_fsm_pages, no: else yes; Maybe my english suck, but I don't understand the above answer. If I overrun my Max_FSM_pages then a vacuum analyse is enough to return it back to normal. If I _didn't_ overrun my fsm, then a reindex/cluster is necessary. Did I get that right? (I feel it's wrong and a reindex/cluster is needed only when I overrun my max_fsm) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings