Re: [PERFORM] Replication Lag Causes
On 2 November 2014 05:33, Mike Wilson mfwil...@gmail.com wrote: Any recommendations would be very helpful. Try using ionice and renice to increase the priority of the WAL sender process on the master. If it helps, you are lagging because not enough resources are being used by the sender process (rather than the slave having trouble, for example). Lowering the number of concurrent connections in your pgbouncer connection pool could help here. -- Stuart Bishop stu...@stuartbishop.net http://www.stuartbishop.net/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Lag Causes
2014-11-02 19:16 GMT-02:00 Mike Wilson mfwil...@gmail.com: Thanks for the information Greg. Unfortunately modifying the application stack this close to the holiday season won’t be an option so I’m left with: 1) Trying to optimize the settings I have for the query mix I have. 2) Optimize any long running DML queries (if any) to prevent lag due to locks. 3) Getting a better understanding of “what” causes lag. #3 will probably be central to at least minimizing lag during heavy DML load. If anyone has a good resource to describe when a slave would start to lag potentially that would help me hunt for the cause. I know long running DML on the master may cause lag but I’m uncertain as to the specifics of why. During periods of lag we do have more DML than usual running against the master but the queries themselves are very quick although there might be 20-30 DML operations per second against some of our central tables that store user account information. Even under heavy DML the queries still return in under a second. Possibly a large volume of of short running DML cause replication lag issues for large tables (~20M)? Thanks again for your help. BDR looks interesting but probably too cutting edge for my client. Mike Wilson On Nov 2, 2014, at 12:33 PM, Greg Spiegelberg gspiegelb...@gmail.com wrote: Hi Mike, Sounds very familiar. Our master fans out to 16 slaves (cascading) and we had great success with segregating database queries to different slaves and some based on network latency. I'd suggest, if possible, alter the application to use the slave for simple SELECT's and FUNCTION's performing SELECT-like only work while limiting those applications and queries that perform DML to the master (obviously). If the load on the slave increases too much, spin up another slave. I'd mention from experience that it could be the load on the slave that is giving the appearance of replication lag. This is what led us to having (1) slave per application. There is also the BDR multi-master available in 9.4beta if you're wanting to live on the edge. -Greg On Sat, Nov 1, 2014 at 4:33 PM, Mike Wilson mfwil...@gmail.com wrote: I have two 9.3.4 PG instances that back a large internet website that has very seasonal traffic and can generate large query loads. My instances are in a master-slave streaming replication setup and are stable and in general perform very well. The only issues we have with the boxes is that when the master is busy the slave may start to lag excessively. I can give specifics as to what heavily loaded means and additionally the postgresql.conf for both boxes but my basic questions are: * What causes streaming replication lag to increase? * What parameters can be tuned to reduce streaming replication lag? * Can a loaded slave affect lag adversely? * Can increasing max_wal_senders help reduce lag? The reason I ask this is that as mentioned above the servers are stable and are real troopers in general as they back a very popular web site that puts the master under heavy seasonal load at times. At those times though we see an almost exponential growth in streaming replication lag compared to load on the master. For example, the master is a very beefy Solaris: * 4 Recent Intel Zeons (16 physical cores) * 256 GB of ECC RAM * 12 TB of ZFS (spindle and SSD internal storage) * DB on disk size is 2TB * ZFS ARC cache of roughly 250G. * ZFS ARC2/ZIL configured for SSD’s (this is awesome by the way) Basic PG Config: shared_buffers = 2GB work_mem = 128MB max_connections = 1700 (supports roughly 100 web servers) wal_keep_segments = 256 (roughly enough for 24 hours of operation under heavy load) wal_sender_timeout = 60s replication_timeout=(not set) wal_receiver_status_interval=10s max_wal_senders=6 * wal archiving is off * 98% of the queries on the master complete in under 500ms. * No hung or very long running queries in general. The master on a normal day maintains a load of about 0.5, during which replication lag to the slave is in hundreds milliseconds. When the production db server is heavily hit though the load may go as high as 4 on the master and the streaming replication lag may increase to more than 2 hours relatively quickly. Load on the slave is generally below 1 even when the master is heavily loaded. The traffic to the master is primarily read with about 10% DML (new users, purchase records, etc). DML statements increase proportionally when under load though. The master and slave are connected via dedicated 10G fiber link and even under heavy load the utilization of the link is nowhere near close to saturation. BTW, the slave does run some reported related queries throughout the day that might take up to a minute to complete. I have the task of figuring out why this otherwise healthy DB starts to lag so badly under
Re: [PERFORM] Replication Lag Causes
Load on the slave is relatively light. It averages about 1.0 due to some data ware house select queries running against it frequently. Previously only the load on the master seems to have affected our replication lag no matter what the slave was doing. In thinking about this a bit more, the load on the master does cause increasing lag but only if the query mix begins to change to more DML than SELECTS. Basically, the amount of DML is what really appears to cause the replication to lag. This is an OLTP system backing a rather heavy commercial website where memberships are sold and when the purchase traffic increases that is when we start to see extreme lag develop on the slave. CPU utilization on the slave during extreme lag is similar to normal operation even if the slave is lagging more than usual. Thanks for the info on max_wal_senders. That’s good to know. Mike Wilson On Nov 1, 2014, at 4:14 PM, Valentine Gogichashvili val...@gmail.com wrote: Hello Mike, what kind of load does the slave get? what does the recovery process do on the slave during the times when lag is being observed? Does it use 100% of the CPU? WAL can be replayed by only one process, so no need to increase the max_wal_senders. Cheers, -- Valentine Gogichashvili On Sun, Nov 2, 2014 at 1:33 AM, Mike Wilson mfwil...@gmail.com mailto:mfwil...@gmail.com wrote: I have two 9.3.4 PG instances that back a large internet website that has very seasonal traffic and can generate large query loads. My instances are in a master-slave streaming replication setup and are stable and in general perform very well. The only issues we have with the boxes is that when the master is busy the slave may start to lag excessively. I can give specifics as to what heavily loaded means and additionally the postgresql.conf for both boxes but my basic questions are: * What causes streaming replication lag to increase? * What parameters can be tuned to reduce streaming replication lag? * Can a loaded slave affect lag adversely? * Can increasing max_wal_senders help reduce lag? The reason I ask this is that as mentioned above the servers are stable and are real troopers in general as they back a very popular web site that puts the master under heavy seasonal load at times. At those times though we see an almost exponential growth in streaming replication lag compared to load on the master. For example, the master is a very beefy Solaris: * 4 Recent Intel Zeons (16 physical cores) * 256 GB of ECC RAM * 12 TB of ZFS (spindle and SSD internal storage) * DB on disk size is 2TB * ZFS ARC cache of roughly 250G. * ZFS ARC2/ZIL configured for SSD’s (this is awesome by the way) Basic PG Config: shared_buffers = 2GB work_mem = 128MB max_connections = 1700 (supports roughly 100 web servers) wal_keep_segments = 256 (roughly enough for 24 hours of operation under heavy load) wal_sender_timeout = 60s replication_timeout=(not set) wal_receiver_status_interval=10s max_wal_senders=6 * wal archiving is off * 98% of the queries on the master complete in under 500ms. * No hung or very long running queries in general. The master on a normal day maintains a load of about 0.5, during which replication lag to the slave is in hundreds milliseconds. When the production db server is heavily hit though the load may go as high as 4 on the master and the streaming replication lag may increase to more than 2 hours relatively quickly. Load on the slave is generally below 1 even when the master is heavily loaded. The traffic to the master is primarily read with about 10% DML (new users, purchase records, etc). DML statements increase proportionally when under load though. The master and slave are connected via dedicated 10G fiber link and even under heavy load the utilization of the link is nowhere near close to saturation. BTW, the slave does run some reported related queries throughout the day that might take up to a minute to complete. I have the task of figuring out why this otherwise healthy DB starts to lag so badly under load and if there is anything that we could do about it. I’ve been wondering particularly if we should up the max_wal_senders but from the docs it is unclear if that would help. In my testing with pg_bench on our dev boxes which were the previous production hardware for these servers I have determined that it doesn’t take much DML load on the master to get the slave to start lagging severely. I was wondering if this was expected and/or some design consideration? Possibly streaming replication isn’t meant to be used for heavily hit databases and maintain small lag times? I would like to believe that the fault is something we have done though and that there is some parameter we could tune to reduce this lag. Any recommendations would be very
Re: [PERFORM] Replication Lag Causes
Thanks for the information Greg. Unfortunately modifying the application stack this close to the holiday season won’t be an option so I’m left with: 1) Trying to optimize the settings I have for the query mix I have. 2) Optimize any long running DML queries (if any) to prevent lag due to locks. 3) Getting a better understanding of “what” causes lag. #3 will probably be central to at least minimizing lag during heavy DML load. If anyone has a good resource to describe when a slave would start to lag potentially that would help me hunt for the cause. I know long running DML on the master may cause lag but I’m uncertain as to the specifics of why. During periods of lag we do have more DML than usual running against the master but the queries themselves are very quick although there might be 20-30 DML operations per second against some of our central tables that store user account information. Even under heavy DML the queries still return in under a second. Possibly a large volume of of short running DML cause replication lag issues for large tables (~20M)? Thanks again for your help. BDR looks interesting but probably too cutting edge for my client. Mike Wilson On Nov 2, 2014, at 12:33 PM, Greg Spiegelberg gspiegelb...@gmail.com wrote: Hi Mike, Sounds very familiar. Our master fans out to 16 slaves (cascading) and we had great success with segregating database queries to different slaves and some based on network latency. I'd suggest, if possible, alter the application to use the slave for simple SELECT's and FUNCTION's performing SELECT-like only work while limiting those applications and queries that perform DML to the master (obviously). If the load on the slave increases too much, spin up another slave. I'd mention from experience that it could be the load on the slave that is giving the appearance of replication lag. This is what led us to having (1) slave per application. There is also the BDR multi-master available in 9.4beta if you're wanting to live on the edge. -Greg On Sat, Nov 1, 2014 at 4:33 PM, Mike Wilson mfwil...@gmail.com mailto:mfwil...@gmail.com wrote: I have two 9.3.4 PG instances that back a large internet website that has very seasonal traffic and can generate large query loads. My instances are in a master-slave streaming replication setup and are stable and in general perform very well. The only issues we have with the boxes is that when the master is busy the slave may start to lag excessively. I can give specifics as to what heavily loaded means and additionally the postgresql.conf for both boxes but my basic questions are: * What causes streaming replication lag to increase? * What parameters can be tuned to reduce streaming replication lag? * Can a loaded slave affect lag adversely? * Can increasing max_wal_senders help reduce lag? The reason I ask this is that as mentioned above the servers are stable and are real troopers in general as they back a very popular web site that puts the master under heavy seasonal load at times. At those times though we see an almost exponential growth in streaming replication lag compared to load on the master. For example, the master is a very beefy Solaris: * 4 Recent Intel Zeons (16 physical cores) * 256 GB of ECC RAM * 12 TB of ZFS (spindle and SSD internal storage) * DB on disk size is 2TB * ZFS ARC cache of roughly 250G. * ZFS ARC2/ZIL configured for SSD’s (this is awesome by the way) Basic PG Config: shared_buffers = 2GB work_mem = 128MB max_connections = 1700 (supports roughly 100 web servers) wal_keep_segments = 256 (roughly enough for 24 hours of operation under heavy load) wal_sender_timeout = 60s replication_timeout=(not set) wal_receiver_status_interval=10s max_wal_senders=6 * wal archiving is off * 98% of the queries on the master complete in under 500ms. * No hung or very long running queries in general. The master on a normal day maintains a load of about 0.5, during which replication lag to the slave is in hundreds milliseconds. When the production db server is heavily hit though the load may go as high as 4 on the master and the streaming replication lag may increase to more than 2 hours relatively quickly. Load on the slave is generally below 1 even when the master is heavily loaded. The traffic to the master is primarily read with about 10% DML (new users, purchase records, etc). DML statements increase proportionally when under load though. The master and slave are connected via dedicated 10G fiber link and even under heavy load the utilization of the link is nowhere near close to saturation. BTW, the slave does run some reported related queries throughout the day that might take up to a minute to complete. I have the task of figuring out why this otherwise healthy DB starts to lag so badly
[PERFORM] Replication Lag Causes
I have two 9.3.4 PG instances that back a large internet website that has very seasonal traffic and can generate large query loads. My instances are in a master-slave streaming replication setup and are stable and in general perform very well. The only issues we have with the boxes is that when the master is busy the slave may start to lag excessively. I can give specifics as to what heavily loaded means and additionally the postgresql.conf for both boxes but my basic questions are: * What causes streaming replication lag to increase? * What parameters can be tuned to reduce streaming replication lag? * Can a loaded slave affect lag adversely? * Can increasing max_wal_senders help reduce lag? The reason I ask this is that as mentioned above the servers are stable and are real troopers in general as they back a very popular web site that puts the master under heavy seasonal load at times. At those times though we see an almost exponential growth in streaming replication lag compared to load on the master. For example, the master is a very beefy Solaris: * 4 Recent Intel Zeons (16 physical cores) * 256 GB of ECC RAM * 12 TB of ZFS (spindle and SSD internal storage) * DB on disk size is 2TB * ZFS ARC cache of roughly 250G. * ZFS ARC2/ZIL configured for SSD’s (this is awesome by the way) Basic PG Config: shared_buffers = 2GB work_mem = 128MB max_connections = 1700 (supports roughly 100 web servers) wal_keep_segments = 256 (roughly enough for 24 hours of operation under heavy load) wal_sender_timeout = 60s replication_timeout=(not set) wal_receiver_status_interval=10s max_wal_senders=6 * wal archiving is off * 98% of the queries on the master complete in under 500ms. * No hung or very long running queries in general. The master on a normal day maintains a load of about 0.5, during which replication lag to the slave is in hundreds milliseconds. When the production db server is heavily hit though the load may go as high as 4 on the master and the streaming replication lag may increase to more than 2 hours relatively quickly. Load on the slave is generally below 1 even when the master is heavily loaded. The traffic to the master is primarily read with about 10% DML (new users, purchase records, etc). DML statements increase proportionally when under load though. The master and slave are connected via dedicated 10G fiber link and even under heavy load the utilization of the link is nowhere near close to saturation. BTW, the slave does run some reported related queries throughout the day that might take up to a minute to complete. I have the task of figuring out why this otherwise healthy DB starts to lag so badly under load and if there is anything that we could do about it. I’ve been wondering particularly if we should up the max_wal_senders but from the docs it is unclear if that would help. In my testing with pg_bench on our dev boxes which were the previous production hardware for these servers I have determined that it doesn’t take much DML load on the master to get the slave to start lagging severely. I was wondering if this was expected and/or some design consideration? Possibly streaming replication isn’t meant to be used for heavily hit databases and maintain small lag times? I would like to believe that the fault is something we have done though and that there is some parameter we could tune to reduce this lag. Any recommendations would be very helpful. Mike Wilson Predicate Logic Consulting
Re: [PERFORM] Replication Lag Causes
Hello Mike, what kind of load does the slave get? what does the recovery process do on the slave during the times when lag is being observed? Does it use 100% of the CPU? WAL can be replayed by only one process, so no need to increase the max_wal_senders. Cheers, -- Valentine Gogichashvili On Sun, Nov 2, 2014 at 1:33 AM, Mike Wilson mfwil...@gmail.com wrote: I have two 9.3.4 PG instances that back a large internet website that has very seasonal traffic and can generate large query loads. My instances are in a master-slave streaming replication setup and are stable and in general perform very well. The only issues we have with the boxes is that when the master is busy the slave may start to lag excessively. I can give specifics as to what heavily loaded means and additionally the postgresql.conf for both boxes but my basic questions are: * What causes streaming replication lag to increase? * What parameters can be tuned to reduce streaming replication lag? * Can a loaded slave affect lag adversely? * Can increasing max_wal_senders help reduce lag? The reason I ask this is that as mentioned above the servers are stable and are real troopers in general as they back a very popular web site that puts the master under heavy seasonal load at times. At those times though we see an almost exponential growth in streaming replication lag compared to load on the master. For example, the master is a very beefy Solaris: * 4 Recent Intel Zeons (16 physical cores) * 256 GB of ECC RAM * 12 TB of ZFS (spindle and SSD internal storage) * DB on disk size is 2TB * ZFS ARC cache of roughly 250G. * ZFS ARC2/ZIL configured for SSD’s (this is awesome by the way) Basic PG Config: shared_buffers = 2GB work_mem = 128MB max_connections = 1700 (supports roughly 100 web servers) wal_keep_segments = 256 (roughly enough for 24 hours of operation under heavy load) wal_sender_timeout = 60s replication_timeout=(not set) wal_receiver_status_interval=10s max_wal_senders=6 * wal archiving is off * 98% of the queries on the master complete in under 500ms. * No hung or very long running queries in general. The master on a normal day maintains a load of about 0.5, during which replication lag to the slave is in hundreds milliseconds. When the production db server is heavily hit though the load may go as high as 4 on the master and the streaming replication lag may increase to more than 2 hours relatively quickly. Load on the slave is generally below 1 even when the master is heavily loaded. The traffic to the master is primarily read with about 10% DML (new users, purchase records, etc). DML statements increase proportionally when under load though. The master and slave are connected via dedicated 10G fiber link and even under heavy load the utilization of the link is nowhere near close to saturation. BTW, the slave does run some reported related queries throughout the day that might take up to a minute to complete. I have the task of figuring out why this otherwise healthy DB starts to lag so badly under load and if there is anything that we could do about it. I’ve been wondering particularly if we should up the max_wal_senders but from the docs it is unclear if that would help. In my testing with pg_bench on our dev boxes which were the previous production hardware for these servers I have determined that it doesn’t take much DML load on the master to get the slave to start lagging severely. I was wondering if this was expected and/or some design consideration? Possibly streaming replication isn’t meant to be used for heavily hit databases and maintain small lag times? I would like to believe that the fault is something we have done though and that there is some parameter we could tune to reduce this lag. Any recommendations would be very helpful. Mike Wilson Predicate Logic Consulting
Re: [PERFORM] Replication Syatem
On Wed, Apr 30, 2008 at 11:09 AM, Gauri Kanekar [EMAIL PROTECTED] wrote: relid |relname | n_tup_ins | n_tup_upd | n_tup_hot_upd | n_dead_tup ---++---+---+---+ 16461 | table1 | 0 | 8352496 | 5389 |8351242 Hmm.. So indeed there are very few HOT updates. What is the fillfactor you are using for these tests ? If its much less than 100, the very low percentage of HOT updates would make me guess that you are updating one of the index columns. Otherwise at least the initial updates until you fill up the free space should be HOT. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
table1 structure : idintegernot null codeintegernot null cridintegernot null statuscharacter varying(1)default 'A'::character varying delta1bigintdefault 0 delta2bigintdefault 0 delta3bigintdefault 0 delta4bigintdefault 0 tz_idintegerdefault 0 Indexes: idx1 PRIMARY KEY, btree (id) idx2 UNIQUE, btree (code, crid) idx3 btree (tz_id) idx4 btree (status) code as crid are foreign key. update table1 set delta1 = 100 where code/100 =999; On Wed, Apr 30, 2008 at 12:16 PM, Gauri Kanekar [EMAIL PROTECTED] wrote: fillfactor is set to 80 as you suggested. delta* fields r updated and these fields are no where related to any of the index fields. On Wed, Apr 30, 2008 at 12:13 PM, Pavan Deolasee [EMAIL PROTECTED] wrote: On Wed, Apr 30, 2008 at 11:09 AM, Gauri Kanekar [EMAIL PROTECTED] wrote: relid |relname | n_tup_ins | n_tup_upd | n_tup_hot_upd | n_dead_tup ---++---+---+---+ 16461 | table1 | 0 | 8352496 | 5389 |8351242 Hmm.. So indeed there are very few HOT updates. What is the fillfactor you are using for these tests ? If its much less than 100, the very low percentage of HOT updates would make me guess that you are updating one of the index columns. Otherwise at least the initial updates until you fill up the free space should be HOT. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Regards Gauri -- Regards Gauri
Re: [PERFORM] Replication Syatem
On Wed, Apr 30, 2008 at 12:16 PM, Gauri Kanekar [EMAIL PROTECTED] wrote: fillfactor is set to 80 as you suggested. delta* fields r updated and these fields are no where related to any of the index fields. That's weird. With that fillfactor, you should have a very high percentage of HOT update ratio. It could be a very special case that we might be looking at. I think a self contained test case or a very detail explanation of the exact usage is what we need to explain this behavior. You may also try dropping non-critical indexes and test again. Btw, I haven't been able to reproduce this at my end. With the given indexes and kind of updates, I get very high percentage of HOT updates. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
Gauri Kanekar wrote: HOT doesn't seems to be working in our case. This is table1 structure : idintegernot null codeintegernot null cridintegernot null statuscharacter varying(1)default 'A'::character varying delta1bigintdefault 0 delta2bigintdefault 0 delta3bigintdefault 0 delta4bigintdefault 0 tz_idintegerdefault 0 Indexes: idx1 PRIMARY KEY, btree (id) idx2 UNIQUE, btree (code, crid) idx3 btree (tz_id) idx4 btree (status) code as crid are foreign key. Here delta* fields get updated through out the day. and most of the time it may update the same row again n again. table1 contains around 12843694 records. Now not understanding y HOT don't work in our case. Changed fillfactor to 80, 75,70 but nothing seems to work. Did you dump and reload the table after setting the fill factor? It only affects newly inserted data. Another possibility is that there's a long running transaction in the background, preventing HOT/vacuum from reclaiming the dead tuples. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
Heikki Linnakangas wrote: Did you dump and reload the table after setting the fill factor? It only affects newly inserted data. VACUUM FULL or CLUSTER should do the job too, right? After all, they recreate the table so they must take the fillfactor into account. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
Craig Ringer wrote: Heikki Linnakangas wrote: Did you dump and reload the table after setting the fill factor? It only affects newly inserted data. VACUUM FULL or CLUSTER should do the job too, right? After all, they recreate the table so they must take the fillfactor into account. CLUSTER, yes. VACUUM FULL won't move tuples around just to make room for the fillfactor. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
Please keep list in the loop. On Wed, Apr 30, 2008 at 6:45 PM, Gauri Kanekar [EMAIL PROTECTED] wrote: Hi, We have recreated the indices with fillfactor set to 80, which has improved HOT a little, Wait. Did you say, you recreated the indexes with fill factor ? That's no help for HOT. You need to recreate the TABLEs with a fill factor. And as Heikki pointed out, you need to dump and reload, just altering the table won't affect the current data. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
Pavan Deolasee [EMAIL PROTECTED] writes: That's weird. With that fillfactor, you should have a very high percentage of HOT update ratio. It could be a very special case that we might be looking at. He's testing update table1 set delta1 = 100 where code/100 =999; so all the rows being updated fall into a contiguous range of code values. If the table was loaded in such a way that those rows were also physically contiguous, then the updates would be localized and would very soon run out of freespace on those pages. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
[EMAIL PROTECTED] (Frank Ch. Eigler) writes: Tom Lane [EMAIL PROTECTED] writes: Also, you need to make sure you have the FSM parameters set high enough so that all the free space found by a VACUUM run can be remembered. Would it be difficult to arrange FSM parameters to be automatically set from the VACUUM reclaim results? Yeah, because the problem is that FSM is kept in shared memory which cannot be resized on-the-fly. In retrospect, trying to keep FSM in shared memory was a spectacularly bad idea (one for which I take full blame). There is work afoot to push it out to disk so that the whole problem goes away; so I don't see much point in worrying about band-aid solutions. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
On Wed, Apr 30, 2008 at 8:16 PM, Tom Lane [EMAIL PROTECTED] wrote: Pavan Deolasee [EMAIL PROTECTED] writes: That's weird. With that fillfactor, you should have a very high percentage of HOT update ratio. It could be a very special case that we might be looking at. He's testing It's She :-) Oh yes. Apologies if I sounded harsh; did not mean that. I was just completely confused why she is not seeing the HOT updates. update table1 set delta1 = 100 where code/100 =999; so all the rows being updated fall into a contiguous range of code values. If the table was loaded in such a way that those rows were also physically contiguous, then the updates would be localized and would very soon run out of freespace on those pages. Yeah, that seems like the pattern. I tested with the similar layout and a fill factor 80. The initial few bulk updates had comparatively less HOT updates (somewhere 20-25%), But within 4-5 iterations of updating the same set of rows, HOT updates were 90-95%. That's because after few iterations (and because of non-HOT updates) the tuples get scattered in various blocks, thus improving chances of HOT updates. I guess the reason probably is that she is using fill factor for indexes and not heap, but she hasn't yet confirmed. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
We have tried fillfactor for indices and it seems to work. Need to try fillfactor for table. May for that reason the bulk update queries don't get the advantage of HOT :) On Wed, Apr 30, 2008 at 9:45 PM, Pavan Deolasee [EMAIL PROTECTED] wrote: On Wed, Apr 30, 2008 at 8:16 PM, Tom Lane [EMAIL PROTECTED] wrote: Pavan Deolasee [EMAIL PROTECTED] writes: That's weird. With that fillfactor, you should have a very high percentage of HOT update ratio. It could be a very special case that we might be looking at. He's testing It's She :-) Oh yes. Apologies if I sounded harsh; did not mean that. I was just completely confused why she is not seeing the HOT updates. update table1 set delta1 = 100 where code/100 =999; so all the rows being updated fall into a contiguous range of code values. If the table was loaded in such a way that those rows were also physically contiguous, then the updates would be localized and would very soon run out of freespace on those pages. Yeah, that seems like the pattern. I tested with the similar layout and a fill factor 80. The initial few bulk updates had comparatively less HOT updates (somewhere 20-25%), But within 4-5 iterations of updating the same set of rows, HOT updates were 90-95%. That's because after few iterations (and because of non-HOT updates) the tuples get scattered in various blocks, thus improving chances of HOT updates. I guess the reason probably is that she is using fill factor for indexes and not heap, but she hasn't yet confirmed. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Regards Gauri
Re: [PERFORM] Replication Syatem
On Tue, 29 Apr 2008, Gauri Kanekar wrote: We do vacuum full, as vacuum verbose analyse dont regain space for us. Ah, now we're getting to the root of your problem here. You expect that VACUUM should reclaim space. Whenever you UPDATE a row, it writes a new one out, then switches to use that version. This leaves behind the original. Those now unused rows are what VACUUM gathers, but it doesn't give that space back to the operating system. The model here assumes that you'll need that space again for the next time you UPDATE or INSERT a row. So instead VACUUM just keeps those available for database reuse rather than returning it to the operating system. Now, if you don't VACUUM frequently enough, this model breaks down, and the table can get bigger with space that may never get reused. The idea is that you should be VACUUMing up now unneeded rows at about the same rate they're being re-used. When you don't keep up, the database can expand in space that you don't get back again. The right answer to this problem is not to use VACUUM FULL; it's to use regular VACUUM more often. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
From most of the reply found that upgrade to higher version of postgres may be to 8.3.1 may be one of the solution to tackle this problem Checked about HOT feature in 8.3.1. Do we need to do any special config changes or any other setting for HOT to work?? Any special guideline to follow to make HOT working?? ~ Gauri On Tue, Apr 29, 2008 at 2:07 PM, Greg Smith [EMAIL PROTECTED] wrote: On Tue, 29 Apr 2008, Gauri Kanekar wrote: We do vacuum full, as vacuum verbose analyse dont regain space for us. Ah, now we're getting to the root of your problem here. You expect that VACUUM should reclaim space. Whenever you UPDATE a row, it writes a new one out, then switches to use that version. This leaves behind the original. Those now unused rows are what VACUUM gathers, but it doesn't give that space back to the operating system. The model here assumes that you'll need that space again for the next time you UPDATE or INSERT a row. So instead VACUUM just keeps those available for database reuse rather than returning it to the operating system. Now, if you don't VACUUM frequently enough, this model breaks down, and the table can get bigger with space that may never get reused. The idea is that you should be VACUUMing up now unneeded rows at about the same rate they're being re-used. When you don't keep up, the database can expand in space that you don't get back again. The right answer to this problem is not to use VACUUM FULL; it's to use regular VACUUM more often. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Regards Gauri
Re: [PERFORM] Replication Syatem
Gauri Kanekar wrote: Andrew, Can you explain me in detail why u said vacuum full is making the things worst. We do vacuum full, as vacuum verbose analyse dont regain space for us. vacuum full stops all access so that the data files can be re-writen without the unused space. normal vacuum will update the records of what space is no longer used so that it can then be reused with the next update/insert. Your db size will not shrink straight away but it will stop growing until you use all the free space left from previous update/delete The more frequently you do a normal vacuum the less time it will take and things will run a lot smoother with your file size growing slowly to accommodate new data. Expanding on what others have mentioned as a drawback of vacuum full - you should look at REINDEX'ing as well (maybe one index or table at a time). You will most likely find this will reclaim some disk space for you as well. -- Shane Ambler pgSQL (at) Sheeky (dot) Biz Get Sheeky @ http://Sheeky.Biz -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
On Tue, Apr 29, 2008 at 4:35 PM, Gauri Kanekar [EMAIL PROTECTED] wrote: Do we need to do any special config changes or any other setting for HOT to work?? No. HOT is enabled by default, on all tables. There is no way and need to disable it. Any special guideline to follow to make HOT working?? You can do couple of things to benefit from HOT. 1. HOT addresses a special, but common case where UPDATE operation does not change any of the index keys. So check if your UPDATE changes any of the index keys. If so, see if you can avoid having index involving that column. Of course, I won't advocate dropping an index if it would drastically impact your frequently run queries. 2. You may leave some free space in the heap (fillfactor less than 100). My recommendation would be to leave space worth of one row or slightly more than that to let first UPDATE be an HOT update. Subsequent UPDATEs in the page may reuse the dead row created by earlier UPDATEs. 3. Avoid any long running transactions. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
Thanx for the help. Need some more help. table1 has two indices unique indx1 = pkfld unique indx2 = fkfld1,fkfld2 did following steps in the listed order - 1. vacuumed the whole DB 2. table1 RecCnt == 11970789 Size == 2702.41 MB 3.update table1 set fld7 = 1000 where fld1/100 = 999 ; this UPDATED 1230307 records 4. checked table1 size again Reccnt = 11970789 Size == 2996.57MB 5. Again did the update, update table1 set fld7 = 1000 where fld1/100 = 999 ; this UPDATED 1230307 records 6. Got table1 size as RecCnt == 11970789 Size == 3290.64 7. Updated again, update table1 set fld7 = 1000 where fld1/100 = 999 ; this UPDATED 1230307 records 6. table1 size as RecCnt == 11970789 Size == 3584.66 Found that the size increased gradually. Is HOT working over here ?? Guide me if im doing something wrong. ~ Gauri On Tue, Apr 29, 2008 at 4:55 PM, Pavan Deolasee [EMAIL PROTECTED] wrote: On Tue, Apr 29, 2008 at 4:35 PM, Gauri Kanekar [EMAIL PROTECTED] wrote: Do we need to do any special config changes or any other setting for HOT to work?? No. HOT is enabled by default, on all tables. There is no way and need to disable it. Any special guideline to follow to make HOT working?? You can do couple of things to benefit from HOT. 1. HOT addresses a special, but common case where UPDATE operation does not change any of the index keys. So check if your UPDATE changes any of the index keys. If so, see if you can avoid having index involving that column. Of course, I won't advocate dropping an index if it would drastically impact your frequently run queries. 2. You may leave some free space in the heap (fillfactor less than 100). My recommendation would be to leave space worth of one row or slightly more than that to let first UPDATE be an HOT update. Subsequent UPDATEs in the page may reuse the dead row created by earlier UPDATEs. 3. Avoid any long running transactions. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Regards Gauri
Re: [PERFORM] Replication Syatem
Gauri Kanekar escribió: Do we need to do any special config changes or any other setting for HOT to work?? No. HOT is always working, if it can. You don't need to configure it. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
Gauri Kanekar escribió: Found that the size increased gradually. Is HOT working over here ?? Guide me if im doing something wrong. Probably not. Try vacuuming between the updates. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
On Tue, Apr 29, 2008 at 6:29 PM, Gauri Kanekar [EMAIL PROTECTED] wrote: Found that the size increased gradually. Is HOT working over here ?? Guide me if im doing something wrong. You have chosen a bad case for HOT. Since you are repeatedly updating the same set of rows, the dead space created in the first step is the blocks which are not touched in the subsequent updates. Is this a real scenario or are you just testing ? If its just for testing, I would suggest updating different sets of rows in each step and then check. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
Thats how our updates works. We usually tend to touch the same row many times a day. ~ Gauri On Tue, Apr 29, 2008 at 6:39 PM, Pavan Deolasee [EMAIL PROTECTED] wrote: On Tue, Apr 29, 2008 at 6:29 PM, Gauri Kanekar [EMAIL PROTECTED] wrote: Found that the size increased gradually. Is HOT working over here ?? Guide me if im doing something wrong. You have chosen a bad case for HOT. Since you are repeatedly updating the same set of rows, the dead space created in the first step is the blocks which are not touched in the subsequent updates. Is this a real scenario or are you just testing ? If its just for testing, I would suggest updating different sets of rows in each step and then check. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Regards Gauri
Re: [PERFORM] Replication Syatem
On Tue, Apr 29, 2008 at 6:42 PM, Gauri Kanekar [EMAIL PROTECTED] wrote: Thats how our updates works. We usually tend to touch the same row many times a day. Then start with a non-100 fillfactor. I would suggest something like 80 and then adjust based on the testing. Since you are anyways have a update intensive setup, leaving free space in the heap won't harm you much in the long term. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
Pavan Deolasee [EMAIL PROTECTED] writes: Any special guideline to follow to make HOT working?? You can do couple of things to benefit from HOT. 1. HOT addresses a special, but common case where UPDATE operation does not change any of the index keys. So check if your UPDATE changes any of the index keys. If so, see if you can avoid having index involving that column. Of course, I won't advocate dropping an index if it would drastically impact your frequently run queries. 2. You may leave some free space in the heap (fillfactor less than 100). My recommendation would be to leave space worth of one row or slightly more than that to let first UPDATE be an HOT update. Subsequent UPDATEs in the page may reuse the dead row created by earlier UPDATEs. 3. Avoid any long running transactions. Perhaps we should put this list in the FAQ. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's 24x7 Postgres support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
Greg Smith [EMAIL PROTECTED] writes: The model here assumes that you'll need that space again for the next time you UPDATE or INSERT a row. So instead VACUUM just keeps those available for database reuse rather than returning it to the operating system. Now, if you don't VACUUM frequently enough, this model breaks down, and the table can get bigger with space that may never get reused. The idea is that you should be VACUUMing up now unneeded rows at about the same rate they're being re-used. When you don't keep up, the database can expand in space that you don't get back again. The right answer to this problem is not to use VACUUM FULL; it's to use regular VACUUM more often. Also, you need to make sure you have the FSM parameters set high enough so that all the free space found by a VACUUM run can be remembered. The less often you run VACUUM, the more FSM space you need, because there'll be more free space reclaimed per run. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
On Apr 29, 2008, at 10:16 AM, Tom Lane wrote: Greg Smith [EMAIL PROTECTED] writes: The model here assumes that you'll need that space again for the next time you UPDATE or INSERT a row. So instead VACUUM just keeps those available for database reuse rather than returning it to the operating system. [ ... ] Also, you need to make sure you have the FSM parameters set high enough so that all the free space found by a VACUUM run can be remembered. The less often you run VACUUM, the more FSM space you need, because there'll be more free space reclaimed per run. I can actually watch one of our applications slow down once the free space in the table is used up. Extending the data file seems to be much more expensive than using the free space found in existing pages of the file. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
[EMAIL PROTECTED] (Gauri Kanekar) writes: Basically we have some background process which updates table1 and we don't want the application to make any changes to table1 while vacuum. Vacuum requires exclusive lock on table1 and if any of the background or application is ON vacuum don't kick off. Thats the reason we need to get the site down. VACUUM has not required an exclusive lock on tables since version 7.1. What version of PostgreSQL are you running? -- output = (cbbrowne @ acm.org) http://linuxdatabases.info/info/sap.html Rules of the Evil Overlord #192. If I appoint someone as my consort, I will not subsequently inform her that she is being replaced by a younger, more attractive woman. http://www.eviloverlord.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
Alvaro Herrera wrote: Gauri Kanekar escribió: Do we need to do any special config changes or any other setting for HOT to work?? No. HOT is always working, if it can. You don't need to configure it. Unless you have upgraded since you started this thread you are still running 8.1.3. HOT is only available in 8.3 and 8.3.1 You DO need to upgrade to get the benefits of HOT -- Shane Ambler pgSQL (at) Sheeky (dot) Biz Get Sheeky @ http://Sheeky.Biz -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
HOT doesn't seems to be working in our case. This is table1 structure : idintegernot null codeintegernot null cridintegernot null statuscharacter varying(1)default 'A'::character varying delta1bigintdefault 0 delta2bigintdefault 0 delta3bigintdefault 0 delta4bigintdefault 0 tz_idintegerdefault 0 Indexes: idx1 PRIMARY KEY, btree (id) idx2 UNIQUE, btree (code, crid) idx3 btree (tz_id) idx4 btree (status) code as crid are foreign key. Here delta* fields get updated through out the day. and most of the time it may update the same row again n again. table1 contains around 12843694 records. Now not understanding y HOT don't work in our case. Changed fillfactor to 80, 75,70 but nothing seems to work. ~Gauri On Tue, Apr 29, 2008 at 10:18 PM, Shane Ambler [EMAIL PROTECTED] wrote: Alvaro Herrera wrote: Gauri Kanekar escribió: Do we need to do any special config changes or any other setting for HOT to work?? No. HOT is always working, if it can. You don't need to configure it. Unless you have upgraded since you started this thread you are still running 8.1.3. HOT is only available in 8.3 and 8.3.1 You DO need to upgrade to get the benefits of HOT -- Shane Ambler pgSQL (at) Sheeky (dot) Biz Get Sheeky @ http://Sheeky.Biz -- Regards Gauri
Re: [PERFORM] Replication Syatem
On Wed, Apr 30, 2008 at 10:59 AM, Gauri Kanekar [EMAIL PROTECTED] wrote: HOT doesn't seems to be working in our case. Can you please post output of the following query ? SELECT relid, relname, n_tup_ins, n_tup_upd, n_tup_hot_upd, n_dead_tup from pg_stat_user_tables WHERE relname = 'table1'; Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
relid |relname | n_tup_ins | n_tup_upd | n_tup_hot_upd | n_dead_tup ---++---+---+---+ 16461 | table1 | 0 | 8352496 | 5389 |8351242 On Wed, Apr 30, 2008 at 11:07 AM, Pavan Deolasee [EMAIL PROTECTED] wrote: On Wed, Apr 30, 2008 at 10:59 AM, Gauri Kanekar [EMAIL PROTECTED] wrote: HOT doesn't seems to be working in our case. Can you please post output of the following query ? SELECT relid, relname, n_tup_ins, n_tup_upd, n_tup_hot_upd, n_dead_tup from pg_stat_user_tables WHERE relname = 'table1'; Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Regards Gauri
[PERFORM] Replication Syatem
All, We have a table table1 which get insert and updates daily in high numbers, bcoz of which its size is increasing and we have to vacuum it every alternate day. Vacuuming table1 take almost 30min and during that time the site is down. We need to cut down on this downtime.So thought of having a replication system, for which the replicated DB will be up during the master is getting vacuumed. Can anybody guide which will be the best suited replication solution for this. Thanx for any help ~ Gauri
Re: [PERFORM] Replication Syatem
2008/4/28 Gauri Kanekar [EMAIL PROTECTED]: All, We have a table table1 which get insert and updates daily in high numbers, bcoz of which its size is increasing and we have to vacuum it every alternate day. Vacuuming table1 take almost 30min and during that time the site is down. We need to cut down on this downtime.So thought of having a replication system, for which the replicated DB will be up during the master is getting vacuumed. Can anybody guide which will be the best suited replication solution for this. Thanx for any help ~ Gauri I home your not using Vacuum Full... (Standard Reply for this type of question) What version of Postgresql are you using? Have you tried autovacuum? Run plain vacuum even more often on this even more often (like ever half hour) and it should not take as long and save space. If still have trouble run vacuum analyse verbose table1; and see what it says. If your doing it right you should be able to vacuum with the database up. Sounds like you might be happier a fix for the problem rather than a complex work around which will actually solve a completely different problem. Regards Peter.
Re: [PERFORM] Replication Syatem
Peter, We are doing vacuum full every alternate day. We also do vacuum analyze very often. We are currently using 8.1.3 version. Auto vacuum is already on. But the table1 is so busy that auto vacuum don't get sufficient chance to vacuum it :(. Have already tried all the option listed by you, thats y we reached to the decision of having a replication sytsem. So any suggestion on that :). Thanx ~ Gauri On Mon, Apr 28, 2008 at 7:28 PM, Peter Childs [EMAIL PROTECTED] wrote: 2008/4/28 Gauri Kanekar [EMAIL PROTECTED]: All, We have a table table1 which get insert and updates daily in high numbers, bcoz of which its size is increasing and we have to vacuum it every alternate day. Vacuuming table1 take almost 30min and during that time the site is down. We need to cut down on this downtime.So thought of having a replication system, for which the replicated DB will be up during the master is getting vacuumed. Can anybody guide which will be the best suited replication solution for this. Thanx for any help ~ Gauri I home your not using Vacuum Full... (Standard Reply for this type of question) What version of Postgresql are you using? Have you tried autovacuum? Run plain vacuum even more often on this even more often (like ever half hour) and it should not take as long and save space. If still have trouble run vacuum analyse verbose table1; and see what it says. If your doing it right you should be able to vacuum with the database up. Sounds like you might be happier a fix for the problem rather than a complex work around which will actually solve a completely different problem. Regards Peter. -- Regards Gauri
Re: [PERFORM] Replication Syatem
On Mon, 2008-04-28 at 19:35 +0530, Gauri Kanekar wrote: Peter, We are doing vacuum full every alternate day. We also do vacuum analyze very often. We are currently using 8.1.3 version. Auto vacuum is already on. But the table1 is so busy that auto vacuum don't get sufficient chance to vacuum it :(. You should seriously consider upgrading to PG 8.3. There have been substantial improvements to VACUUM since 8.1 Brad. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
Gauri Kanekar wrote: Peter, We are doing vacuum full every alternate day. We also do vacuum analyze very often. We are currently using 8.1.3 version. Auto vacuum is already on. But the table1 is so busy that auto vacuum don't get sufficient chance to vacuum it :(. Have already tried all the option listed by you, thats y we reached to the decision of having a replication sytsem. So any suggestion on that :). Thanx ~ Gauri We use slony for exactly this type of a situation. It's not the most user-friendly piece of software, but it works well enough that I can schedule maintenance windows (we're a 24/7 shop) and do clustering and other tasks on our DB to reclaim space, etc. -salman -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
Thats one of the thingsto be done in near future. But it need some changes from application point of view. :( ... so just got escalated for that reason. But for now, which one will be a well suited replication system ? ~ Gauri On Mon, Apr 28, 2008 at 7:43 PM, Brad Nicholson [EMAIL PROTECTED] wrote: On Mon, 2008-04-28 at 19:35 +0530, Gauri Kanekar wrote: Peter, We are doing vacuum full every alternate day. We also do vacuum analyze very often. We are currently using 8.1.3 version. Auto vacuum is already on. But the table1 is so busy that auto vacuum don't get sufficient chance to vacuum it :(. You should seriously consider upgrading to PG 8.3. There have been substantial improvements to VACUUM since 8.1 Brad. -- Regards Gauri
Re: [PERFORM] Replication Syatem
Salman, Slony don't do automatic failover. And we would appreciate a system with automatic failover :( ~ Gauri On Mon, Apr 28, 2008 at 7:46 PM, salman [EMAIL PROTECTED] wrote: Gauri Kanekar wrote: Peter, We are doing vacuum full every alternate day. We also do vacuum analyze very often. We are currently using 8.1.3 version. Auto vacuum is already on. But the table1 is so busy that auto vacuum don't get sufficient chance to vacuum it :(. Have already tried all the option listed by you, thats y we reached to the decision of having a replication sytsem. So any suggestion on that :). Thanx ~ Gauri We use slony for exactly this type of a situation. It's not the most user-friendly piece of software, but it works well enough that I can schedule maintenance windows (we're a 24/7 shop) and do clustering and other tasks on our DB to reclaim space, etc. -salman -- Regards Gauri
Re: [PERFORM] Replication Syatem
On Mon, Apr 28, 2008 at 07:35:37PM +0530, Gauri Kanekar wrote: Peter, We are doing vacuum full every alternate day. We also do vacuum analyze very often. VACUUM FULL is making your problem worse, not better. Don't do that. We are currently using 8.1.3 version. You need immediately to upgrade to the latest 8.1 stability and security release, which is 8.1.11. This is a drop-in replacement. It's an urgent fix for your case. Auto vacuum is already on. But the table1 is so busy that auto vacuum don't get sufficient chance to vacuum it :(. You probably need to tune autovacuum not to do that table, and just vacuum that table in a constant loop or something. VACUUM should _never_ take the site down. If it does, you're doing it wrong. Have already tried all the option listed by you, thats y we reached to the decision of having a replication sytsem. So any suggestion on that :). I think you will find that no replication system will solve your underlying problems. That said, I happen to work for a company that will sell you a replication system to work with 8.1 if you really want it. A -- Andrew Sullivan [EMAIL PROTECTED] +1 503 667 4564 x104 http://www.commandprompt.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
On Mon, Apr 28, 2008 at 07:48:48PM +0530, Gauri Kanekar wrote: Slony don't do automatic failover. And we would appreciate a system with automatic failover :( No responsible asynchronous system will give you automatic failover. You can lose data that way. A -- Andrew Sullivan [EMAIL PROTECTED] +1 503 667 4564 x104 http://www.commandprompt.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
On Mon, 28 Apr 2008, Gauri Kanekar wrote: We are doing vacuum full every alternate day. We also do vacuum analyze very often. We are currently using 8.1.3 version...Have already tried all the option listed by you, thats y we reached to the decision of having a replication sytsem. Andrew Sullivan has already given a response here I agree with, I wanted to expland on that. You have a VACUUM problem. The fact that you need (or feel you need) to VACUUM FULL every other day says there's something very wrong here. The way to solve most VACUUM problems is to VACUUM more often, so that the work in each individual one never gets so big that your system takes an unnaceptable hit, and you shouldn't ever need VACUUM FULL. Since your problem is being aggrevated because you're running a dangerously obsolete version, that's one of the first things you should fix--to at least the latest 8.1 if you can't deal with a larger version migration. The fact that you're happily running 8.1.3 says you most certainly haven't tried all the other options here. Every minute you spend looking into a replication system is wasted time you could be spending on the right fix here. You've fallen into the common trap where you're fixated on a particular technical solution so much that you're now ignoring suggestions on how to resolve the root problem. Replication is hard to get going even on a system that works perfectly, and replicating a known buggy system just to work around a problem really sounds like a bad choice. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
On Mon, Apr 28, 2008 at 9:38 AM, Gauri Kanekar [EMAIL PROTECTED] wrote: All, We have a table table1 which get insert and updates daily in high numbers, bcoz of which its size is increasing and we have to vacuum it every alternate day. Vacuuming table1 take almost 30min and during that time the site is down. Slony is an open source replication system built for Postgres. But the real problem is that you are doing a vaccum full every day. This is highly invasive. Take a look at the postgres docs on Vacuuming the db. Analyze is best on a daily basis. If you have a lot of deletes, then try vacuum truncate. The postgres documentation describes the various vaccuum options and explains the merits of each. Hope that helps. Radhika -- It is all a matter of perspective. You choose your view by choosing where to stand. --Larry Wall -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
[EMAIL PROTECTED] (Gauri Kanekar) writes: We have a table table1 which get insert and updates daily in high numbers, bcoz of which its size is increasing and we have to vacuum it every alternate day. Vacuuming table1 take almost 30min and during that time the site is down. We need to cut down on this downtime.So thought of having a replication system, for which the replicated DB will be up during the master is getting vacuumed. Can anybody guide which will be the best suited replication solution for this. The only reason that it would be necessary for VACUUM to take the site down would be if you are running version 7.1, which was obsoleted in 2002, which, it should be noted, was SIX YEARS AGO. As has been noted, you seem to be presupposing a remarkably complex solution to resolve a problem which is likely to be better handled via running VACUUM rather more frequently. -- output = reverse(ofni.sesabatadxunil @ enworbbc) http://www3.sympatico.ca/cbbrowne/postgresql.html Rules of the Evil Overlord #181. I will decree that all hay be shipped in tightly-packed bales. Any wagonload of loose hay attempting to pass through a checkpoint will be set on fire. http://www.eviloverlord.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
Basically we have some background process which updates table1 and we don't want the application to make any changes to table1 while vacuum. Vacuum requires exclusive lock on table1 and if any of the background or application is ON vacuum don't kick off. Thats the reason we need to get the site down. ~ Gauri On Tue, Apr 29, 2008 at 3:13 AM, Chris Browne [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] (Gauri Kanekar) writes: We have a table table1 which get insert and updates daily in high numbers, bcoz of which its size is increasing and we have to vacuum it every alternate day. Vacuuming table1 take almost 30min and during that time the site is down. We need to cut down on this downtime.So thought of having a replication system, for which the replicated DB will be up during the master is getting vacuumed. Can anybody guide which will be the best suited replication solution for this. The only reason that it would be necessary for VACUUM to take the site down would be if you are running version 7.1, which was obsoleted in 2002, which, it should be noted, was SIX YEARS AGO. As has been noted, you seem to be presupposing a remarkably complex solution to resolve a problem which is likely to be better handled via running VACUUM rather more frequently. -- output = reverse(ofni.sesabatadxunil @ enworbbc) http://www3.sympatico.ca/cbbrowne/postgresql.html Rules of the Evil Overlord #181. I will decree that all hay be shipped in tightly-packed bales. Any wagonload of loose hay attempting to pass through a checkpoint will be set on fire. http://www.eviloverlord.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Regards Gauri
Re: [PERFORM] Replication Syatem
But unless we do full vacuum the space is not recovered. Thats y we prefer full vacuum. ~ Gauri On Tue, Apr 29, 2008 at 10:38 AM, Greg Smith [EMAIL PROTECTED] wrote: On Tue, 29 Apr 2008, Gauri Kanekar wrote: Basically we have some background process which updates table1 and we don't want the application to make any changes to table1 while vacuum. Vacuum requires exclusive lock on table1 and if any of the background or application is ON vacuum don't kick off. VACUUM FULL needs an exclusive lock, the regular one does not in 8.1. It's one of the reasons FULL should be avoided. If you do regular VACUUM frequently enough, you shouldn't ever need to do a FULL one anyway. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Regards Gauri
Re: [PERFORM] Replication Syatem
On Tue, 29 Apr 2008, Gauri Kanekar wrote: Basically we have some background process which updates table1 and we don't want the application to make any changes to table1 while vacuum. Vacuum requires exclusive lock on table1 and if any of the background or application is ON vacuum don't kick off. VACUUM FULL needs an exclusive lock, the regular one does not in 8.1. It's one of the reasons FULL should be avoided. If you do regular VACUUM frequently enough, you shouldn't ever need to do a FULL one anyway. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
Andrew, Can you explain me in detail why u said vacuum full is making the things worst. We do vacuum full, as vacuum verbose analyse dont regain space for us. ~ Gauri On Mon, Apr 28, 2008 at 9:52 PM, Andrew Sullivan [EMAIL PROTECTED] wrote: On Mon, Apr 28, 2008 at 07:35:37PM +0530, Gauri Kanekar wrote: Peter, We are doing vacuum full every alternate day. We also do vacuum analyze very often. VACUUM FULL is making your problem worse, not better. Don't do that. We are currently using 8.1.3 version. You need immediately to upgrade to the latest 8.1 stability and security release, which is 8.1.11. This is a drop-in replacement. It's an urgent fix for your case. Auto vacuum is already on. But the table1 is so busy that auto vacuum don't get sufficient chance to vacuum it :(. You probably need to tune autovacuum not to do that table, and just vacuum that table in a constant loop or something. VACUUM should _never_ take the site down. If it does, you're doing it wrong. Have already tried all the option listed by you, thats y we reached to the decision of having a replication sytsem. So any suggestion on that :). I think you will find that no replication system will solve your underlying problems. That said, I happen to work for a company that will sell you a replication system to work with 8.1 if you really want it. A -- Andrew Sullivan [EMAIL PROTECTED] +1 503 667 4564 x104 http://www.commandprompt.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Regards Gauri
Re: [PERFORM] Replication Syatem
On Tue, Apr 29, 2008 at 10:41 AM, Gauri Kanekar [EMAIL PROTECTED] wrote: But unless we do full vacuum the space is not recovered. Thats y we prefer full vacuum. There is no point in recovering the space by moving tuples and truncating the relation (that's what VACUUM FULL does) because you are doing frequent updates on the table and that would again extend the relation. If you run plain VACUUM, that would recover dead space and update the free space maps. It may not be able to reduce the table size, but you should not be bothered much about it because the following updates/inserts will fill in the fragmented free space. You may want to check your FSM settings as well to make sure that you are tracking free space properly. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
On Tue, Apr 29, 2008 at 11:16 AM, Gauri Kanekar [EMAIL PROTECTED] wrote: Andrew, Can you explain me in detail why u said vacuum full is making the things worst. 1. VACUUM FULL takes exclusive lock on the table. That makes table unavailable for read/writes. 2. VACUUM FULL moves live tuples around. When a tuple is moved, the old index entry is deleted and a new index entry is inserted. This causes index bloats which are hard to recover. We do vacuum full, as vacuum verbose analyse dont regain space for us. As I mentioned in the other reply, you are not gaining much by regaining space. The subsequent UPDATEs/INSERTs will quickly extend the relation and you loose all the work done by VACUUM FULL. Plain VACUUM will update FSM to track the free space scattered across the relation which is later reused by updates/inserts. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication
Hi, Andrew Sullivan wrote: This isn't quite true. Slony-II was originally conceived by Jan as an attempt to implement some of the Postgres-R ideas. Oh, right, thanks for that correction. Part of the problem, as near as I could tell, was that we had no group communication protocol that would really work. Spread needed a _lot_ of work (where lot of work may mean rewrite), and I just didn't have the humans to put on that problem. Another part of the problem was that, for high-contention workloads like the ones we happened to be working on, an optimistic approach like Postgres-R is probably always going to be a loser. Hm.. for high-contention on single rows, sure, yes - you would mostly get rollbacks for conflicting transactions. But the optimism there is justified, as I think most real world transactions don't conflict (or else you can work around such high single row contention). You are right in that the serialization of the GCS can be bottleneck. However, there's lots of research going on in that area and I'm convinced that Postgres-R has it's value. Regards Markus ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Replication
On Mon, Jun 18, 2007 at 08:54:46PM +0200, Markus Schiltknecht wrote: Postgres-R has been the name of the research project by Bettina Kemme et al. Slony-II was the name Neil and Gavin gave their attempt to continue that project. This isn't quite true. Slony-II was originally conceived by Jan as an attempt to implement some of the Postgres-R ideas. For our uses, however, Postgres-R had built into it a rather knotty design problem: under high-contention workloads, it will automatically increase the number of ROLLBACKs users experience. Jan had some ideas on how to solve this by moving around the GC events and doing slightly different things with them. To that end, Afilias sponsored a small workshop in Toronto during one of the coldest weeks the city has ever seen. This should have been a clue, perhaps. ;-) Anyway, the upshot of this was that two or three different approaches were attempted in prototypes. AFAIK, Neil and Gavin got the farthest, but just about everyone who was involved in the original workshop all independently concluded that the approach we were attempting to get to work was doomed -- it might go, but the overhead was great enough that it wouldn't be any benefit. Part of the problem, as near as I could tell, was that we had no group communication protocol that would really work. Spread needed a _lot_ of work (where lot of work may mean rewrite), and I just didn't have the humans to put on that problem. Another part of the problem was that, for high-contention workloads like the ones we happened to be working on, an optimistic approach like Postgres-R is probably always going to be a loser. A -- Andrew Sullivan | [EMAIL PROTECTED] In the future this spectacle of the middle classes shocking the avant- garde will probably become the textbook definition of Postmodernism. --Brad Holland ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Replication
On Thu, 2007-06-14 at 16:14 -0700, Craig James wrote: Looking for replication solutions, I find: Slony-I Seems good, single master only, master is a single point of failure, no good failover system for electing a new master or having a failed master rejoin the cluster. Slave databases are mostly for safety or for parallelizing queries for performance. Suffers from O(N^2) communications (N = cluster size). There's MOVE SET which transfers the origin (master) from one node to another without losing any committed transactions. There's also FAILOVER, which can set a new origin even if the old origin is completely gone, however you will lose the transactions that haven't been replicated yet. To have a new node join the cluster, you SUBSCRIBE SET, and you can MOVE SET to it later if you want that to be the master. Regards, Jeff Davis ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Replication
Hi, Joshua D. Drake wrote: Slony-II Seems brilliant, a solid theoretical foundation, at the forefront of computer science. But can't find project status -- when will it be available? Is it a pipe dream, or a nearly-ready reality? Dead Not quite... there's still Postgres-R, see www.postgres-r.org And I'm continuously working on it, despite not having updated the website for almost a year now... I planned on releasing the next development snapshot together with 8.3, as that seems to be delayed, that seems realistic ;-) Regards Markus ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Replication
Markus Schiltknecht wrote: Not quite... there's still Postgres-R, see www.postgres-r.org And I'm continuously working on it, despite not having updated the website for almost a year now... I planned on releasing the next development snapshot together with 8.3, as that seems to be delayed, that seems realistic ;-) Is Postgres-R the same thing as Slony-II? There's a lot of info and news around about Slony-II, but your web page doesn't seem to mention it. While researching replication solutions, I had a heck of a time sorting out the dead or outdated web pages (like the stuff on gborg) from the active projects. Either way, it's great to know you're working on it. Craig ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Replication
Hi, Craig James wrote: Is Postgres-R the same thing as Slony-II? There's a lot of info and news around about Slony-II, but your web page doesn't seem to mention it. Hm... true. Good point. Maybe I should add a FAQ: Postgres-R has been the name of the research project by Bettina Kemme et al. Slony-II was the name Neil and Gavin gave their attempt to continue that project. I've based my work on the old (6.4.2) Postgres-R source code - and I'm still calling it Postgres-R, probably Postgres-R (8) to distinguish it from the original one. But I'm thinking about changing the name completely... however, I'm a developer, not a marketing guru. While researching replication solutions, I had a heck of a time sorting out the dead or outdated web pages (like the stuff on gborg) from the active projects. Yeah, that's one of the main problems with replication for PostgreSQL. I hope Postgres-R (or whatever name I'll come up with in the future) can change that. Either way, it's great to know you're working on it. Maybe you want to join its mailing list [1]? I'll try to get some discussion going there in the near future. Regards Markus [1]: Postgres-R on gborg: http://pgfoundry.org/projects/postgres-r/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Replication
On 6/15/07, Craig James [EMAIL PROTECTED] wrote: I don't think I can use PGPool as the replicator, because then it becomes a new single point of failure that could bring the whole system down. If you're using it for INSERT/UPDATE, then there can only be one PGPool server. Are you sure? I have been considering this possibility, too, but I didn't find anything in the documentation. The main mechanism of the proxy is taking received updates and playing them one multiple servers with 2PC, and the proxies should not need to keep any state about this, so why couldn't you install multiple proxies? Alexander. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Replication
On 6/14/07, Craig A. James [EMAIL PROTECTED] wrote: Looking for replication solutions, I find: Slony-I Seems good, single master only, master is a single point of failure, no good failover system for electing a new master or having a failed master rejoin the cluster. Slave databases are mostly for safety or for parallelizing queries for performance. Suffers from O(N^2) communications (N = cluster size). with reasonable sysadmin you can implement failover system yourself. regarding communications, you can cascade the replication to reduce load on the master. If you were implementing a large replication cluster, this would probably be a good idea. Slony is powerful, trigger based, and highly configurable. Slony-II Seems brilliant, a solid theoretical foundation, at the forefront of computer science. But can't find project status -- when will it be available? Is it a pipe dream, or a nearly-ready reality? aiui, this has not gone beyond early planning phases. PGReplication Appears to be a page that someone forgot to erase from the old GBorg site. PGCluster Seems pretty good, but web site is not current, there are releases in use that are not on the web site, and also seems to always be a couple steps behind the current release of Postgres. Two single-points failure spots, load balancer and the data replicator. Is this a good summary of the status of replication? Have I missed any important solutions or mischaracterized anything? pgpool 1/2 is a reasonable solution. it's statement level replication, which has some downsides, but is good for certain things. pgpool 2 has a neat distributed table mechanism which is interesting. You might want to be looking here if you have extremely high ratios of read to write but need to service a huge transaction volume. PITR is a HA solution which 'replicates' a database cluster to an archive or a warm (can be brought up quickly, but not available for querying) standby server. Overhead is very low and it's easy to set up. This is maybe the simplest and best solution if all you care about is continuous backup. There are plans (a GSoC project, actually) to make the warm standby live for (read only) queries...if/when complete, this would provide a replication mechanism similar. but significantly better to, mysql binary log replication, and would provide an excellent compliment to slony. there is also the mammoth replicator...I don't know anything about it, maybe someone could comment? merlin ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Replication
Hello, On Thu, 2007-06-14 at 16:14 -0700, Craig James wrote: Cluster Seems pretty good, but web site is not current, http://www.pgcluster.org is a bit up2date, also http://pgfoundry.org/projects/pgcluster is up2date (at least downloads page :) ) Regards, -- Devrim GÜNDÜZ PostgreSQL Replication, Consulting, Custom Development, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/ signature.asc Description: This is a digitally signed message part
Re: [PERFORM] Replication
On Thu, 14 Jun 2007 17:38:01 -0700 Craig James [EMAIL PROTECTED] wrote: I would consider PGCluster, but it seems to be a patch to Postgres itself. I'm reluctant to introduce such a major piece of technology Yes it is. For most of the time it is not very much behind actual versions of postgresql. The project's biggest drawbacks, as I see: - horrible documentation - changing configuration without any warning/help to the user (as far as there are only rc-s, I can't really blame the developers for that... :) ) - there are only rc -s, no stable version available for current postgresql releases. I think this project needs someone speaking english very well, and having the time and will to coordinate and document all the code that is written. Otherwise the idea and the solution seems to be very good. If someone - with big luck and lot of try-fail efforts - sets up a working system, then it will be stable and working for long time. into our entire system, when only one tiny part of it needs the replication service. Thanks, Craig Rgds, Akos -- Üdvözlettel, Gábriel Ákos -=E-Mail :[EMAIL PROTECTED]|Web: http://www.i-logic.hu =- -=Tel/fax:+3612367353|Mobil:+36209278894 =- ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[PERFORM] Replication
Looking for replication solutions, I find: Slony-I Seems good, single master only, master is a single point of failure, no good failover system for electing a new master or having a failed master rejoin the cluster. Slave databases are mostly for safety or for parallelizing queries for performance. Suffers from O(N^2) communications (N = cluster size). Slony-II Seems brilliant, a solid theoretical foundation, at the forefront of computer science. But can't find project status -- when will it be available? Is it a pipe dream, or a nearly-ready reality? PGReplication Appears to be a page that someone forgot to erase from the old GBorg site. PGCluster Seems pretty good, but web site is not current, there are releases in use that are not on the web site, and also seems to always be a couple steps behind the current release of Postgres. Two single-points failure spots, load balancer and the data replicator. Is this a good summary of the status of replication? Have I missed any important solutions or mischaracterized anything? Thanks! Craig ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Replication
Craig James wrote: Looking for replication solutions, I find: Slony-I Seems good, single master only, master is a single point of failure, no good failover system for electing a new master or having a failed master rejoin the cluster. Slave databases are mostly for safety or for parallelizing queries for performance. Suffers from O(N^2) communications (N = cluster size). Yep Slony-II Seems brilliant, a solid theoretical foundation, at the forefront of computer science. But can't find project status -- when will it be available? Is it a pipe dream, or a nearly-ready reality? Dead PGReplication Appears to be a page that someone forgot to erase from the old GBorg site. Dead PGCluster Seems pretty good, but web site is not current, there are releases in use that are not on the web site, and also seems to always be a couple steps behind the current release of Postgres. Two single-points failure spots, load balancer and the data replicator. Slow as all get out for writes but cool idea Is this a good summary of the status of replication? Have I missed any important solutions or mischaracterized anything? log shipping, closed source solutions Thanks! Craig ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Replication
Which replication problem are you trying to solve? On Thu, 14 Jun 2007, Craig James wrote: Looking for replication solutions, I find: Slony-I Seems good, single master only, master is a single point of failure, no good failover system for electing a new master or having a failed master rejoin the cluster. Slave databases are mostly for safety or for parallelizing queries for performance. Suffers from O(N^2) communications (N = cluster size). Slony-II Seems brilliant, a solid theoretical foundation, at the forefront of computer science. But can't find project status -- when will it be available? Is it a pipe dream, or a nearly-ready reality? PGReplication Appears to be a page that someone forgot to erase from the old GBorg site. PGCluster Seems pretty good, but web site is not current, there are releases in use that are not on the web site, and also seems to always be a couple steps behind the current release of Postgres. Two single-points failure spots, load balancer and the data replicator. Is this a good summary of the status of replication? Have I missed any important solutions or mischaracterized anything? Thanks! Craig ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Replication
On 6/15/07, Craig James [EMAIL PROTECTED] wrote: [snip] Is this a good summary of the status of replication? Have I missed any important solutions or mischaracterized anything? * Mammoth Replicator, commercial. * Continuent uni/cluster, commercial (http://www.continuent.com/index.php?option=com_contenttask=viewid=212Itemid=169). * pgpool-II. Supports load-balancing and replication by implementing a proxy that duplicates all updates to all slaves. It can partition data by doing this, and it can semi-intelligently route queries to the appropriate servers. * Cybertec. This is a commercial packaging of PGCluster-II from an Austrian company. * Greenplum Database (formerly Bizgres MPP), commercial. Not so much a replication solution as a way to parallelize queries, and targeted at the data warehousing crowd. Similar to ExtenDB, but tightly integrated with PostgreSQL. * DRDB (http://www.drbd.org/), a device driver that replicates disk blocks to other nodes. This works for failover only, not for scaling reads. Easy migration of devices if combined with an NFS export. * Skytools (https://developer.skype.com/SkypeGarage/DbProjects/SkyTools), a collection of replication tools from the Skype people. Purports to be simpler to use than Slony. Lastly, and perhaps most promisingly, there's the Google Summer of Code effort by Florian Pflug (http://code.google.com/soc/postgres/appinfo.html?csaid=6545828A8197EBC6) to implement true log-based replication, where PostgreSQL's transaction logs are used to keep live slave servers up to date with a master. In theory, such a system would be extremely simple to set up and use, especially since it should, as far as I can see, also transparently replicate the schema for you. Alexander. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Replication
On Thu, Jun 14, 2007 at 6:14 PM, in message [EMAIL PROTECTED], Craig James [EMAIL PROTECTED] wrote: Looking for replication solutions, I find: Slony-I Slony-II PGReplication PGCluster You wouldn't guess it from the name, but pgpool actually supports replication: http://pgpool.projects.postgresql.org/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Replication
Thanks to all who replied and filled in the blanks. The problem with the web is you never know if you've missed something. Joshua D. Drake wrote: Looking for replication solutions, I find... Slony-II Dead Wow, I'm surprised. Is it dead for lack of need, lack of resources, too complex, or all of the above? It sounded like such a promising theoretical foundation. Ben wrote: Which replication problem are you trying to solve? Most of our data is replicated offline using custom tools tailored to our loading pattern, but we have a small amount of global information, such as user signups, system configuration, advertisements, and such, that go into a single small (~5-10 MB) global database used by all servers. We need nearly-real-time replication, and instant failover. That is, it's far more important for the system to keep working than it is to lose a little data. Transactional integrity is not important. Actual hardware failures are rare, and if a user just happens to sign up, or do save preferences, at the instant the global-database server goes down, it's not a tragedy. But it's not OK for the entire web site to go down when the one global-database server fails. Slony-I can keep several slave databases up to date, which is nice. And I think I can combine it with a PGPool instance on each server, with the master as primary and few Slony-copies as secondary. That way, if the master goes down, the PGPool servers all switch to their secondary Slony slaves, and read-only access can continue. If the master crashes, users will be able to do most activities, but new users can't sign up, and existing users can't change their preferences, until either the master server comes back, or one of the slaves is promoted to master. The problem is, there don't seem to be any vote a new master type of tools for Slony-I, and also, if the original master comes back online, it has no way to know that a new master has been elected. So I'd have to write a bunch of SOAP services or something to do all of this. I would consider PGCluster, but it seems to be a patch to Postgres itself. I'm reluctant to introduce such a major piece of technology into our entire system, when only one tiny part of it needs the replication service. Thanks, Craig ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Replication
Looking for replication solutions, I find: Slony-I Seems good, single master only, master is a single point of failure, no good failover system for electing a new master or having a failed master rejoin the cluster. Slave databases are mostly for safety or for parallelizing queries for performance. Suffers from O(N^2) communications (N = cluster size). Slony-II Seems brilliant, a solid theoretical foundation, at the forefront of computer science. But can't find project status -- when will it be available? Is it a pipe dream, or a nearly-ready reality? PGReplication Appears to be a page that someone forgot to erase from the old GBorg site. PGCluster Seems pretty good, but web site is not current, there are releases in use that are not on the web site, and also seems to always be a couple steps behind the current release of Postgres. Two single-points failure spots, load balancer and the data replicator. Is this a good summary of the status of replication? Have I missed any important solutions or mischaracterized anything? Thanks! Craig (Sorry about the premature send of this message earlier, please ignore.) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Replication
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Most of our data is replicated offline using custom tools tailored to our loading pattern, but we have a small amount of global information, such as user signups, system configuration, advertisements, and such, that go into a single small (~5-10 MB) global database used by all servers. Slony provides near instantaneous failovers (in the single digit seconds range). You can script an automatic failover if the master server becomes unreachable. That leaves you the problem of restarting your app (or making it reconnect) to the new master. 5-10MB data implies such a fast initial replication, that making the server rejoin the cluster by setting it up from scratch is not an issue. The problem is, there don't seem to be any vote a new master type of tools for Slony-I, and also, if the original master comes back online, it has no way to know that a new master has been elected. So I'd have to write a bunch of SOAP services or something to do all of this. You don't need SOAP services, and you do not need to elect a new master. if dbX goes down, dbY takes over, you should be able to decide on a static takeover pattern easily enough. The point here is, that the servers need to react to a problem, but you probably want to get the admin on duty to look at the situation as quickly as possible anyway. With 5-10MB of data in the database, a complete rejoin from scratch to the cluster is measured in minutes. Furthermore, you need to checkout pgpool, I seem to remember that it has some bad habits in routing queries. (E.g. it wants to apply write queries to all nodes, but slony makes the other nodes readonly. Furthermore, anything inside a BEGIN is sent to the master node, which is bad with some ORMs, that by default wrap any access into a transaction) Andreas -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGceUXHJdudm4KnO0RAgh/AJ4kXFpzoQAEnn1B7K6pzoCxk0wFxQCggGF1 mA1KWvcKtfJ6ZcPiajJK1i4= =eoNN -END PGP SIGNATURE- ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Replication
Andreas Kostyrka wrote: Slony provides near instantaneous failovers (in the single digit seconds range). You can script an automatic failover if the master server becomes unreachable. But Slony slaves are read-only, correct? So the system isn't fully functional once the master goes down. That leaves you the problem of restarting your app (or making it reconnect) to the new master. Don't you have to run a Slony app to convert one of the slaves into the master? 5-10MB data implies such a fast initial replication, that making the server rejoin the cluster by setting it up from scratch is not an issue. The problem is to PREVENT it from rejoining the cluster. If you have some semi-automatic process that detects the dead server and converts a slave to the master, and in the mean time the dead server manages to reboot itself (or its network gets fixed, or whatever the problem was), then you have two masters sending out updates, and you're screwed. The problem is, there don't seem to be any vote a new master type of tools for Slony-I, and also, if the original master comes back online, it has no way to know that a new master has been elected. So I'd have to write a bunch of SOAP services or something to do all of this. You don't need SOAP services, and you do not need to elect a new master. if dbX goes down, dbY takes over, you should be able to decide on a static takeover pattern easily enough. I can't see how that is true. Any self-healing distributed system needs something like the following: - A distributed system of nodes that check each other's health - A way to detect that a node is down and to transmit that information across the nodes - An election mechanism that nominates a new master if the master fails - A way for a node coming online to determine if it is a master or a slave Any solution less than this can cause corruption because you can have two nodes that both think they're master, or end up with no master and no process for electing a master. As far as I can tell, Slony doesn't do any of this. Is there a simpler solution? I've never heard of one. The point here is, that the servers need to react to a problem, but you probably want to get the admin on duty to look at the situation as quickly as possible anyway. No, our requirement is no administrator interaction. We need instant, automatic recovery from failure so that the system stays online. Furthermore, you need to checkout pgpool, I seem to remember that it has some bad habits in routing queries. (E.g. it wants to apply write queries to all nodes, but slony makes the other nodes readonly. Furthermore, anything inside a BEGIN is sent to the master node, which is bad with some ORMs, that by default wrap any access into a transaction) I should have been more clear about this. I was planning to use PGPool in the PGPool-1 mode (not the new PGPool-2 features that allow replication). So it would only be acting as a failover mechanism. Slony would be used as the replication mechanism. I don't think I can use PGPool as the replicator, because then it becomes a new single point of failure that could bring the whole system down. If you're using it for INSERT/UPDATE, then there can only be one PGPool server. I was thinking I'd put a PGPool server on every machine in failover mode only. It would have the Slony master as the primary connection, and a Slony slave as the failover connection. The applications would route all INSERT/UPDATE statements directly to the Slony master, and all SELECT statements to the PGPool on localhost. When the master failed, all of the PGPool servers would automatically switch to one of the Slony slaves. This way, the system would keep running on the Slony slaves (so it would be read-only), until a sysadmin could get the master Slony back online. And when the master came online, the PGPool servers would automatically reconnect and write-access would be restored. Does this make sense? Craig ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Replication
Craig James wrote: Andreas Kostyrka wrote: Slony provides near instantaneous failovers (in the single digit seconds range). You can script an automatic failover if the master server becomes unreachable. But Slony slaves are read-only, correct? So the system isn't fully functional once the master goes down. That is what promotion is for. Joshua D. Drake ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Replication
What about Daffodil Replicator - GPL - http://sourceforge.net/projects/daffodilreplica/ -- Thanks, Eugene Ogurtsov Internal Development Chief Architect SWsoft, Inc. Craig A. James wrote: Looking for replication solutions, I find: Slony-I Seems good, single master only, master is a single point of failure, no good failover system for electing a new master or having a failed master rejoin the cluster. Slave databases are mostly for safety or for parallelizing queries for performance. Suffers from O(N^2) communications (N = cluster size). Slony-II Seems brilliant, a solid theoretical foundation, at the forefront of computer science. But can't find project status -- when will it be available? Is it a pipe dream, or a nearly-ready reality? PGReplication Appears to be a page that someone forgot to erase from the old GBorg site. PGCluster Seems pretty good, but web site is not current, there are releases in use that are not on the web site, and also seems to always be a couple steps behind the current release of Postgres. Two single-points failure spots, load balancer and the data replicator. Is this a good summary of the status of replication? Have I missed any important solutions or mischaracterized anything? Thanks! Craig (Sorry about the premature send of this message earlier, please ignore.) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Replication
Ok, slony supports two kinds of operation here: failover (which moves the master node to a new one without the old master node being present, it also drops the old node from replication) and move set (which moves the master node with cooperation) The usecases for these two are slightly different. one is for all kinds of scheduled maintenance, while the other is what you do when you've got a hardware failure. Andreas -- Ursprüngl. Mitteil. -- Betreff:Re: [PERFORM] Replication Von:Craig James [EMAIL PROTECTED] Datum: 15.06.2007 01:48 Andreas Kostyrka wrote: Slony provides near instantaneous failovers (in the single digit seconds range). You can script an automatic failover if the master server becomes unreachable. But Slony slaves are read-only, correct? So the system isn't fully functional once the master goes down. That leaves you the problem of restarting your app (or making it reconnect) to the new master. Don't you have to run a Slony app to convert one of the slaves into the master? 5-10MB data implies such a fast initial replication, that making the server rejoin the cluster by setting it up from scratch is not an issue. The problem is to PREVENT it from rejoining the cluster. If you have some semi-automatic process that detects the dead server and converts a slave to the master, and in the mean time the dead server manages to reboot itself (or its network gets fixed, or whatever the problem was), then you have two masters sending out updates, and you're screwed. The problem is, there don't seem to be any vote a new master type of tools for Slony-I, and also, if the original master comes back online, it has no way to know that a new master has been elected. So I'd have to write a bunch of SOAP services or something to do all of this. You don't need SOAP services, and you do not need to elect a new master. if dbX goes down, dbY takes over, you should be able to decide on a static takeover pattern easily enough. I can't see how that is true. Any self-healing distributed system needs something like the following: - A distributed system of nodes that check each other's health - A way to detect that a node is down and to transmit that information across the nodes - An election mechanism that nominates a new master if the master fails - A way for a node coming online to determine if it is a master or a slave Any solution less than this can cause corruption because you can have two nodes that both think they're master, or end up with no master and no process for electing a master. As far as I can tell, Slony doesn't do any of this. Is there a simpler solution? I've never heard of one. The point here is, that the servers need to react to a problem, but you probably want to get the admin on duty to look at the situation as quickly as possible anyway. No, our requirement is no administrator interaction. We need instant, automatic recovery from failure so that the system stays online. Furthermore, you need to checkout pgpool, I seem to remember that it has some bad habits in routing queries. (E.g. it wants to apply write queries to all nodes, but slony makes the other nodes readonly. Furthermore, anything inside a BEGIN is sent to the master node, which is bad with some ORMs, that by default wrap any access into a transaction) I should have been more clear about this. I was planning to use PGPool in the PGPool-1 mode (not the new PGPool-2 features that allow replication). So it would only be acting as a failover mechanism. Slony would be used as the replication mechanism. I don't think I can use PGPool as the replicator, because then it becomes a new single point of failure that could bring the whole system down. If you're using it for INSERT/UPDATE, then there can only be one PGPool server. I was thinking I'd put a PGPool server on every machine in failover mode only. It would have the Slony master as the primary connection, and a Slony slave as the failover connection. The applications would route all INSERT/UPDATE statements directly to the Slony master, and all SELECT statements to the PGPool on localhost. When the master failed, all of the PGPool servers would automatically switch to one of the Slony slaves. This way, the system would keep running on the Slony slaves (so it would be read-only), until a sysadmin could get the master Slony back online. And when the master came online, the PGPool servers would automatically reconnect and write-access would be restored. Does this make sense? Craig ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Replication: Slony-I vs. Mammoth Replicator vs. ?
One more point for your list: Choose Slony if Replicator doesn't support your platform. :-) -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D.Khera Communications, Inc. Internet: [EMAIL PROTECTED] Rockville, MD +1-301-869-4449 x806 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/ ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Replication: Slony-I vs. Mammoth Replicator vs. ?
Once again, Joshua, would you please explain what you mean with batch and live replication system? Slony does group multiple master transactions into one replication transaction to improve performance (fewer commits on the slaves). The interval of these groups is configurable and for high volume DBs it is recommended to use about one second, which means that all commits that fall into an interval of one second are replicated in one transaction on the slave. On normal running systems this results in a replication lag of 600 to 800 milliseconds in average. On overloaded systems the asynchronous nature of course allows the slaves to fall behind. Your description above is what I considered batch... you are taking a batch of transactions and replicating them versus each transaction. I am not saying it is bad in any way. I am just saying it is different that replicator. What is a usual average replication lag of Mammoth Replicator? Obviously it depends on the system, the network connectivity between the systems etc... In our test systems it takes less than 100 ms to replicate the data. Again it depends on the size of the transaction (the data being moved). What happens to the other existing slaves when you promote by hand? This is something that Slony has over replicator. Currently the new master will force a full dump to the slaves. Of course this is already on the road map, thanks to Slony :) and should be resolved by months end. The Slony documentation is an issue at the moment and the administrative tools around it are immature. The replication engine itself exceeds my own expectations and performs very robust. I have never suggested otherwise. My only comment about maturity is that their are actually many companies using replicator in production. We have already dealt with the 1.0 blues as they say. I hope you understand that I, in no way have ever suggested (purposely) anything negative about Slony. Only that I believe they serve different technical solutions. Sincerely, Joshua D. Drake Jan -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com PostgreSQL Replicator -- production quality replication for PostgreSQL ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Replication: Slony-I vs. Mammoth Replicator vs. ?
Centuries ago, Nostradamus foresaw when [EMAIL PROTECTED] (Joshua D. Drake) would write: I hope you understand that I, in no way have ever suggested (purposely) anything negative about Slony. Only that I believe they serve different technical solutions. Stipulating that I may have some bias ;-), I still don't find it at all clear what the different situations are shaped like that lead to Mammoth being forcibly preferable to Slony-I. (Note that I have a pretty decent understanding about how ERS and Slony work, so I'm not too frightened of technicalities... I set up instances of both on Thursday, so I'm pretty up to speed :-).) Win32 support may be true at the moment, although I have to discount that as we only just got the start of a beta release of native Win32 support for PostgreSQL proper. For that very reason, I had to point my youngest brother who needed something better than Access to Firebird last Saturday; I played with my niece while he was doing the install. And there is little reason to think that Slony-I won't be portable to Win32 given a little interest and effort, particularly once work to make it play well with pgxs gets done. -- (format nil [EMAIL PROTECTED] cbbrowne ntlug.org) http://www3.sympatico.ca/cbbrowne/multiplexor.html At Microsoft, it doesn't matter which file you're compiling, only which flags you #define. -- Colin Plumb ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Replication: Slony-I vs. Mammoth Replicator vs. ?
Christopher Browne wrote: Centuries ago, Nostradamus foresaw when [EMAIL PROTECTED] ("Joshua D. Drake") would write: I hope you understand that I, in no way have ever suggested (purposely) anything negative about Slony. Only that I believe they serve different technical solutions. Stipulating that I may have some bias ;-), I still don't find it at all clear what the different situations are "shaped like" that lead to Mammoth being forcibly preferable to Slony-I. I would choose replicator if: 1. You want ease of setup 2. You want your each transaction to be replicated at time of commit 3. Your database is already laden with triggers 4. You are pushing a very high transactional load* * Caveat I have no idea how well Slony performs on a system that does say 200,000 transactions an hours that are heavily geared toward updates. Replicator performs very well in this scenario. 5. Replicators administrative tools are more mature than Slony (for example you know exactly what state your slaves are in with Replicator). I would choose Slony if: 1. The fact that it is Open Source matters to you 2. The auto promotion of slaves is important* *This will be fixed in a couple of weeks with Replicator To be fair, in the real world --- It doesn't make a bit of difference which one you choose it really comes down to this: Replicator is dumb simple to setup. Any halfway talented person can setup replicator in 30 minutes with a single master / slave configuration. Slony is Open Source and thus a little easier on the pocket book initially. Command Prompt, will support either one -- so the Replicator is commercially supported argument is a little weak here. Sincerely, Joshua D. Drake Sincerely, Joshua D. Drake (Note that I have a pretty decent understanding about how ERS and Slony work, so I'm not too frightened of technicalities... I set up instances of both on Thursday, so I'm pretty up to speed :-).) Win32 support may be true at the moment, although I have to discount that as we only just got the start of a beta release of native Win32 support for PostgreSQL proper. For that very reason, I had to point my youngest brother who needed "something better than Access" to Firebird last Saturday; I played with my niece while he was doing the install. And there is little reason to think that Slony-I won't be portable to Win32 given a little interest and effort, particularly once work to make it play well with "pgxs" gets done. -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com PostgreSQL Replicator -- production quality replication for PostgreSQL
Re: [PERFORM] Replication: Slony-I vs. Mammoth Replicator vs. ?
Chris Cheston wrote: HI all, I'm trying to implement a highly-scalable, high-performance, real-time database replication system to back-up my Postgres database as data gets written. So far, Mammoth Replicator is looking pretty good but it costs $1000+ . Yes but it includes 30 days of support and 12 months of upgrades/updates :) Has anyone tried Slony-I and other replication systems? Slony-I is pretty new so I'm a little unsure if it's ready for a prime-time commercial system yet. It really depends on your needs. They are both good systems. Slony-I is a bit more of a beast to get up and running, and it is a batch replication system that uses triggers. Once it is up and running it works well though. Mammoth Replicator is easy to setup and is integrated into PostgreSQL. However replicator is 1000+ and doesn't support promoting of slaves automatically (you can do it by hand) like Slony does. Replicator is also live replication. Sincerely, Joshua D. Drake So... wanted to put this out to the experts. Has anyone got any recommendations or had experiences with real-time database replication solutions that don't rely on RAID? The reason why I don't want to rely on a hardware solution is because we are renting dedicated servers and we don't have access to the boxes, only to software that gets installed on the boxes. Thanks, Chris ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL begin:vcard fn:Joshua D. Drake n:Drake;Joshua D. org:Command Prompt, Inc. adr:;;PO Box 215;Cascade Locks;Oregon;97014;USA email;internet:[EMAIL PROTECTED] title:Consultant tel;work:503-667-4564 tel;fax:503-210-0034 note:Command Prompt, Inc. is the largest and oldest US based commercial PostgreSQL support provider. We provide the only commercially viable integrated PostgreSQL replication solution, but also custom programming, and support. We authored the book Practical PostgreSQL, the procedural language plPHP, and adding trigger capability to plPerl. x-mozilla-html:FALSE url:http://www.commandprompt.com/ version:2.1 end:vcard ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org