Re: [GENERAL] Why does my DB size differ between Production and DR? (Postgres 8.4)
In response to Aleksey Tsalolikhin atsaloli.t...@gmail.com: On Mon, Jan 31, 2011 at 7:52 PM, Scott Marlowe scott.marl...@gmail.com wrote: On Mon, Jan 31, 2011 at 5:54 PM, Aleksey Tsalolikhin atsaloli.t...@gmail.com wrote: Situation: Disk usage on production server root filesystem is at 68% utilization (80 GB used), on DR is at 51% (56 GB used). We use SlonyII-1.2.x to keep the DR up to date. I would like to account for the 24 GB difference. This is likely free space in your database. Some of it is completely normal and actually improves performance. Too much and your db is bloated and things starting taking too long. Thanks, Scott! Bucardo's check_postgres.pl --action bloat complains about one table, 1 GB wasted. So the other tables must be OK. So what about my DR, which doesn't have this same 20+ GB of free space. Will it acquire it once it goes into production? Will performance be impacted as it acquires the free space? Should I even be concerned about the difference in disk usage or is it normal and expected? Difference in free space from master to slaves is typical. Transactions run on the slaves differently than on the master. For example, if you rollback transactions on the master, that can bloat tables, but those activities are never communicated to the slaves because the rollback doesn't alter any data. It's also possible that you have different autovacuum configs on the two different machines (have you checked) or that the hardware isn't the same, thus one is able to vacuum more successfully than the other, or that simply the fates have caused vacuum to start at times that it gets more done on one server than the other. Do not be afraid of vacuum full. It's not that it's an evil command or should never be used, etc. It's just something that has consequences that you need to be aware of, such as: *) It can take a long time *) It locks tables while it works on them, thus it blocks other processes from accessing those tables *) It can cause index bloat However, there are mitigating factors: *) You can tell it which tables to vacuum, thus you can vacuum full one table at a time to recduce the overall impact *) It can be interrupted, so if it's taking longer than you're able to wait, you can cancel it. *) You can use the REINDEX command to clean up index bloat. Based on personal experience, and the fact that you have a slony slave to work with, I recommend the following: 1) On the Slony slave, do the following, timing each step so you have an estimate of how long they will take on the master 1a) VACUUM the table. This is non-locking and will do some preliminary work so that VACUUM FULL takes less time. 1b) VACUUM FULL just that table. Slony will be unable to replicate to the table while the FULL is running, but that's OK, it will catch up after it's done and the master won't be interrupted. 1c) REINDEX just that table. This will have no effect on the master. 2) Now that you have time estimates for all those steps, add the times for 1b and 1c together. This is an estimate of how long the master database will be interrupted while you do maintenance (step 1a does not interrupt other work going on). Schedule downtime for about 2x that time, just in case things run a little longer. 3) Run steps 1a - 1c on the master. Start 1a before your maintenance window starts, with enough time that it should be finished before your maintenance window. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does my DB size differ between Production and DR? (Postgres 8.4)
atsaloli.t...@gmail.com (Aleksey Tsalolikhin) writes: Situation: Disk usage on production server root filesystem is at 68% utilization (80 GB used), on DR is at 51% (56 GB used). We use SlonyII-1.2.x to keep the DR up to date. I would like to account for the 24 GB difference. It's more than likely a result of transactions failing on the origin, leaving dead space around, where replication doesn't bother trying to do any work for the failed stuff, with the consequence that there's no corresponding clutter on the replica. I'm talking here about cases of failures that are expected. Look to what activities you have that tend to lead to tranactions that ROLLBACK. Slony-I makes no attempt to replicate activity that is terminated by ROLLBACK (explicit or implicit), so all that activity won't be processed on replicas. For instance, in our applications, operating domain registries, intentionally failed database transactions occur heavily *common* whenever customers are 'fighting' over domain names - one and only one customer can win the name, while all others lose, and each losing request leaves a certain amount of mess in its wake. Common patterns of this sort include transactions that fail because: - Customer has insufficient funds on account to pay for the transaction - Inventory request fails because there are insufficient items in stock - Attempt to insert a second instance of an object that is required to be unique - Rejection of partially processed transaction due to violation of some business policy (which is mighty open-ended!) It's likely, as well, that there is some set of tables that you are not vacuuming heavily enough. Probably a table or three needs to have CLUSTER run on it to bring them down to size, and you may need to fiddle with autovacuum parameters to vacuum more frequently. -- (reverse (concatenate 'string moc.liamg @ enworbbc)) http://linuxdatabases.info/info/lisp.html Microsoft has world class quality control -- Arthur Norman -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does my DB size differ between Production and DR? (Postgres 8.4)
peter.geoghega...@gmail.com (Peter Geoghegan) writes: On 1 February 2011 03:52, Scott Marlowe scott.marl...@gmail.com wrote: You can reclaim that space by doing a cluster or vacuum full on the subject table. Yes, but this is a fairly bad idea, particularly prior to PG 9.0 . 9.0 has a new vacuum full implementation that makes it not so bad - it just rewrites the entire table. VACUUM FULL will take exclusive locks on tables being vacuumed. It also causes index bloat. You should be very careful about using it on a production system. I'm not sure why you'd advocate CLUSTER as a way to reclaim disk space. Because it works pretty well; it reorganizes the table on the basis of the order indicated by one index, and simultaneously: a) Shortens the table, removing all dead space; b) Regenerates all indices, so they too have no dead space. Traditional VACUUM FULL tends to worsen the dead space problem on indices, so adds the insult to injury problem that after running VACUUM FULL, you might need to reindex, and that aftermath is nearly as expensive as CLUSTER. CLUSTER is likely to be quicker than VACUUM FULL, and it gives nice, squeaky-tight indexes. The new form of VACUUM FULL in 9.0 changes things, but it wasn't obvious that the original poster was on 9.0. I wouldn't increase index fill factor as an optimisation, unless you had the unusual situation of having very static data in the table. -- output = reverse(gro.mca @ enworbbc) http://linuxfinances.info/info/wp.html The world needs more people like us and fewer like them. -- Unknown -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does my DB size differ between Production and DR? (Postgres 8.4)
Thank you for the discussion. I'm on Postgres 8.4, and the hardware between Slony master and slave is identical, as is the autovacuum config. We do have transactions that fail to commit, transactions that roll back. I'm glad to have some idea of the cause of the difference in table size between Slony Master and Slave. If disk usage on the Master goes over 75% before my upgrade money is approved, I will try Bill Moran's suggestion of doing a practice vacuum run on the Slave, and then we'll take a maintenance window to VACUUM, VACUUM FULL, REINDEX; or CLUSTER on the master. THANK YOU! This is a super-helpful list. I really appreciate the positive energy in the PostgreSQL community. I'm looking forward to helping out at the PostgresSQL booth at the So Cal Linux Expo later this month. Thank you! Aleksey -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does my DB size differ between Production and DR? (Postgres 8.4)
On Wed, Feb 2, 2011 at 10:45 AM, Chris Browne cbbro...@acm.org wrote: peter.geoghega...@gmail.com (Peter Geoghegan) writes: I'm not sure why you'd advocate CLUSTER as a way to reclaim disk space. Because it works pretty well; it reorganizes the table on the basis of the order indicated by one index, and simultaneously: a) Shortens the table, removing all dead space; b) Regenerates all indices, so they too have no dead space. It's important at this point to set fill factor before the cluster if something besides the default 100% makes sense. any randomly updated table full of small records will usually benefit from a fill fact even as high as 95% which is very little wasted space for a gain in HOT updates starting in 8.3. HOT saved our bacon at work. They really lowered the requirements for disk access / index update a LOT. I wish I'd have saved the pg_stat_index from 8.1 versus 8.3. And IO numbers. Our load dropped by a power of ten more or less. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does my DB size differ between Production and DR? (Postgres 8.4)
On 2 February 2011 05:41, Scott Marlowe scott.marl...@gmail.com wrote: I wouldn't increase index fill factor as an optimisation, unless you had the unusual situation of having very static data in the table. That makes no sense whatsoever. You decrease fill factor (not increase btw) so there will be some space for future updates. If he's getting bloat it may well help quite a bit to have a lower than 100% fill factor. As I said, it depends on the profile of the data. Heavily or randomly updated tables will benefit from reducing *index* fillfactor - it will reduce index fragmentation. OTOH, indexes for static data can have their fillfactors increased to 100% from the default of 90% without consequence. -- Regards, Peter Geoghegan -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does my DB size differ between Production and DR? (Postgres 8.4)
On Wed, Feb 2, 2011 at 8:49 PM, Peter Geoghegan peter.geoghega...@gmail.com wrote: On 2 February 2011 05:41, Scott Marlowe scott.marl...@gmail.com wrote: I wouldn't increase index fill factor as an optimisation, unless you had the unusual situation of having very static data in the table. That makes no sense whatsoever. You decrease fill factor (not increase btw) so there will be some space for future updates. If he's getting bloat it may well help quite a bit to have a lower than 100% fill factor. As I said, it depends on the profile of the data. Heavily or randomly updated tables will benefit from reducing *index* fillfactor - it will reduce index fragmentation. OTOH, indexes for static data can have their fillfactors increased to 100% from the default of 90% without consequence. Certainly. I was talking table fill factor at the time, so that's why I wasn't sure what you meant. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does my DB size differ between Production and DR? (Postgres 8.4)
On Mon, Jan 31, 2011 at 7:52 PM, Scott Marlowe scott.marl...@gmail.com wrote: On Mon, Jan 31, 2011 at 5:54 PM, Aleksey Tsalolikhin atsaloli.t...@gmail.com wrote: Situation: Disk usage on production server root filesystem is at 68% utilization (80 GB used), on DR is at 51% (56 GB used). We use SlonyII-1.2.x to keep the DR up to date. I would like to account for the 24 GB difference. This is likely free space in your database. Some of it is completely normal and actually improves performance. Too much and your db is bloated and things starting taking too long. Thanks, Scott! Bucardo's check_postgres.pl --action bloat complains about one table, 1 GB wasted. So the other tables must be OK. So what about my DR, which doesn't have this same 20+ GB of free space. Will it acquire it once it goes into production? Will performance be impacted as it acquires the free space? Should I even be concerned about the difference in disk usage or is it normal and expected? How do I find out how much actual data I have in my database, minus the free space? Is there some built-in way to report this, or do I need to run SELECT * FROM * and look at the byte count of the output? Thanks, -at -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does my DB size differ between Production and DR? (Postgres 8.4)
On 1 February 2011 03:52, Scott Marlowe scott.marl...@gmail.com wrote: You can reclaim that space by doing a cluster or vacuum full on the subject table. Yes, but this is a fairly bad idea, particularly prior to PG 9.0 . 9.0 has a new vacuum full implementation that makes it not so bad - it just rewrites the entire table. VACUUM FULL will take exclusive locks on tables being vacuumed. It also causes index bloat. You should be very careful about using it on a production system. I'm not sure why you'd advocate CLUSTER as a way to reclaim disk space. I wouldn't increase index fill factor as an optimisation, unless you had the unusual situation of having very static data in the table. -- Regards, Peter Geoghegan -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does my DB size differ between Production and DR? (Postgres 8.4)
On Tue, Feb 1, 2011 at 8:13 PM, Peter Geoghegan peter.geoghega...@gmail.com wrote: On 1 February 2011 03:52, Scott Marlowe scott.marl...@gmail.com wrote: You can reclaim that space by doing a cluster or vacuum full on the subject table. Yes, but this is a fairly bad idea, particularly prior to PG 9.0 . 9.0 has a new vacuum full implementation that makes it not so bad - it just rewrites the entire table. VACUUM FULL will take exclusive locks on tables being vacuumed. It also causes index bloat. You should be very careful about using it on a production system. I know these things. I'm pretty sure it's even in the docs by now. I'm not sure why you'd advocate CLUSTER as a way to reclaim disk space. Because it can reclaim disk space? http://www.postgresql.org/docs/8.4/static/sql-cluster.html I wouldn't increase index fill factor as an optimisation, unless you had the unusual situation of having very static data in the table. That makes no sense whatsoever. You decrease fill factor (not increase btw) so there will be some space for future updates. If he's getting bloat it may well help quite a bit to have a lower than 100% fill factor. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does my DB size differ between Production and DR? (Postgres 8.4)
More info here. http://www.postgresql.org/docs/8.4/static/routine-vacuuming.html#VACUUM-FOR-SPACE-RECOVERY -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does my DB size differ between Production and DR? (Postgres 8.4)
On Tue, Feb 1, 2011 at 7:29 PM, Aleksey Tsalolikhin atsaloli.t...@gmail.com wrote: Bucardo's check_postgres.pl --action bloat complains about one table, 1 GB wasted. So the other tables must be OK. Pretty sure that unless you give it more args, the default for bloat check is to list the first bloated table and stop. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does my DB size differ between Production and DR? (Postgres 8.4)
-BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 Bucardo's check_postgres.pl --action bloat complains about one table, 1 GB wasted. So the other tables must be OK. Pretty sure that unless you give it more args, the default for bloat check is to list the first bloated table and stop. No, it will show all tables over the given threshhold. However, the statement the other tables must be OK is definitely not a given, as the bloat calculation used by check_postgres is a very rough one. - -- Greg Sabino Mullane g...@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201102020206 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -BEGIN PGP SIGNATURE- iEYEAREDAAYFAk1JAqkACgkQvJuQZxSWSsiH0ACfZowR8lU2PJByBCyhsELpdozg 3SIAnjguAyRbjXxT8cSo6yZ8zar00TNZ =ji8B -END PGP SIGNATURE- -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Why does my DB size differ between Production and DR? (Postgres 8.4)
On Mon, Jan 31, 2011 at 5:54 PM, Aleksey Tsalolikhin atsaloli.t...@gmail.com wrote: Situation: Disk usage on production server root filesystem is at 68% utilization (80 GB used), on DR is at 51% (56 GB used). We use SlonyII-1.2.x to keep the DR up to date. I would like to account for the 24 GB difference. This is likely free space in your database. Some of it is completely normal and actually improves performance. Too much and your db is bloated and things starting taking too long. You can reclaim that space by doing a cluster or vacuum full on the subject table. Setting fill factor ahead of time to something in the 90% range should cut down on bloat as future updates can then happen in place, and also will improve performance of the system as updates happen in the same page and if they're not indexed don't require index updates as well (Heap Only Tuple updates, or HOT updates use this method) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general