Re: [GENERAL] - PostgreSQL Replication Types
On 12/17/2015 07:56 AM, Will McCormick wrote: Thanks a ton for the prompt response. I've read most of this but some it was not clear until we discussed. See here for more detail: WLM: Reading now :) While reading I would suggest having the postgres.conf files on the master and the standby and the recovery.conf file on the standby open so you can see how your settings match up. I would also take a look at what is in: /opt/postgres/9.1/archive/ -- Adrian Klaver adrian.kla...@aklaver.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] - PostgreSQL Replication Types
Thanks for the great assistance On Thu, Dec 17, 2015 at 11:27 AM, Adrian Klaverwrote: > On 12/17/2015 07:56 AM, Will McCormick wrote: > >> Thanks a ton for the prompt response. >> >> I've read most of this but some it was not clear until we discussed. >> >> Updated with WLM: >> >> On 12/17/2015 07:17 AM, Will McCormick wrote: >> >> I inherited a 9.1 replication environment >> >> Few basic questions that I can't find clear answers / clarifications >> for >> if possible: >> >> 3 types of replication in 9.1 I've read about from the offical docs: >> >> 1) warm standby >> 2) hot standby >> 3) streaming replication >> >> I'm using streaming replication I believe, the only indication I have >> is that there is the primary_conninfo on the standby. Is this the >> only >> indication? >> >> >> WLM: I also see streaming replication in the logs. >> >> On standby: >> >> http://www.postgresql.org/docs/9.1/interactive/functions-admin.html >> " >> pg_last_xlog_receive_location() >> >> Get last transaction log location received and synced to disk by >> streaming replication. While streaming replication is in progress this >> will increase monotonically. If recovery has completed this will remain >> static at the value of the last WAL record received and synced to disk >> during recovery. If streaming replication is disabled, or if it has not >> yet started, the function returns NULL." >> >> WLM: When I do this on the standby I get an error: >> >>ERROR: recovery is in progress >> >>HINT: WAL control functions cannot be ... >> > > What is the full hint message? > The functions are supposed to be able to be run while the server is in > recovery. > > > >> >> >> Is it possible to get if using streaming replication under normal >> operations? >> >> /cp: cannot stat `/opt/postgres/9.1/archive/0007000F0057': >> No such file or directory/ >> >> /cp: cannot stat `/opt/postgres/9.1/archive/0007000F0057': >> No such file or directory/ >> >> /LOG: streaming replication successfully connected to primary/ >> >> /FATAL: could not receive data from WAL stream: FATAL: requested WAL >> segment 0007000F0057 has already been removed/ >> >> >> Assuming above is from standby log, correct? WLM: yes >> >> The cp lines would seem to indicate a restore_command in the standby >> recovery.conf, is that the case?: >> >> http://www.postgresql.org/docs/9.1/static/archive-recovery-settings.html >> >> restore_command (string) WLM: Correct >> >> >> The FATAL indicates that the WAL file has already been recycled on the >> master. >> >> WLM: I had read this what confuses me a bit is: >> >>/cp: cannot stat `/opt/postgres/9.1/archive/0007000F0057': >>No such file or directory/ >> >> Does Streaming replication automatically use Archived WAL files when >> WAL XLOG files don't contain a transaction? >> >> We did have wal_keep_segments set to 0. I changed this to 50 but >> want to better understand this. Especially the correlation between the >> Archived WALs and the XLOG WALs. My guess is the difference between >> Streaming replication and the others is very simply that Streaming >> replication can read the XLOG WALs as well? So if all the Archived WALs >> have been shipped and processed to the Standby then the XLOGs are >> processed but not shipped? This meaning at a transaction level "kindof"? >> > > Not really see the section below(#STREAMING-REPLICATION) I posted > previously. It is either or, if streaming is set up and the standby can > reach the master xlog directory then it will stream the WAL files from > there. If the standby cannot access the xlog directory and if you have WAL > archiving set up on the master and archive restore setup on the standby it > will switch to full WAL log shipping from the archived WAL directory, > assuming the files it needs are there. > > > >> See: >> >> >> http://www.postgresql.org/docs/9.1/interactive/runtime-config-replication.html#RUNTIME-CONFIG-REPLICATION-MASTER >> >> "wal_keep_segments (integer) >> >> Specifies the minimum number of past log file segments kept in the >> pg_xlog directory, in case a standby server needs to fetch them for >> streaming replication. Each segment is normally 16 megabytes. If a >> standby server connected to the primary falls behind by more than >> wal_keep_segments segments, the primary might remove a WAL segment still >> needed by the standby, in which case the replication connection will be >> terminated. (However, the standby server can recover by fetching the >> segment from archive, if WAL archiving is in use.) >> >> This sets only the minimum number of segments retained in pg_xlog; >> the system might need to retain more segments for WAL archival or to >> recover from a checkpoint. If wal_keep_segments is zero (the default), >> the system doesn't keep any extra segments for standby purposes,
Re: [GENERAL] - PostgreSQL Replication Types
Hi Will, On 12/17/15 10:17 AM, Will McCormick wrote: > I inherited a 9.1 replication environment > > Few basic questions that I can't find clear answers / clarifications for > if possible: > > 3 types of replication in 9.1 I've read about from the offical docs: > > 1) warm standby This is a standby which is applying WAL segments from the master (via recovery_command (log shipping) or streaming replication. > 2) hot standby Just like a warm standby but adds the ability to do read-only queries (and the master must know it is a hot standby). > 3) streaming replication Another way to deliver WAL segments from the master, but the advantage over recovery_command log shipping is that transactions are applied as soon as they happen on the master, rather than when a WAL segment is pushed from the master to the archive. > I'm using streaming replication I believe, the only indication I have > is that there is the primary_conninfo on the standby. Is this the only > indication? > > Is it possible to get if using streaming replication under normal > operations? > > /cp: cannot stat `/opt/postgres/9.1/archive/0007000F0057': > No such file or directory/ > > /cp: cannot stat `/opt/postgres/9.1/archive/0007000F0057': > No such file or directory/ > > /LOG: streaming replication successfully connected to primary/ > > /FATAL: could not receive data from WAL stream: FATAL: requested WAL > segment 0007000F0057 has already been removed/ What this means is your stannby has been out-of-date for some time. The WAL segment it needs has been expired from both the archive (/opt/postgres/9.1/archive) and the master's pg_xlog directory. Your only option now (unless you can find to required WAL segments somewhere) is to do a new backup using pg_basebackup or some other mechanism to bring the standby up to date. > My understanding is that warm standby and hot standby do log shipping > and there is a greater window for transactions not to be send to the > standby because WAL XLOG must be filled. A hot or warm standby can be maintained with either log shipping or streaming replication. > Whereas Streaming replication basically sends at the transaction level? But yes, this is the advantage of streaming replication. I have written a tutorial that covers setting up a hot standby with or without streaming replication using pgBackRest: http://www.pgbackrest.org/user-guide.html#replication You can replace backup/restore/archive with other methods but the principal remains the same. The tutorial is for 9.4 but should work equally well with 9.1. -- -David da...@pgmasters.net signature.asc Description: OpenPGP digital signature
[GENERAL] - PostgreSQL Replication Types
I inherited a 9.1 replication environment Few basic questions that I can't find clear answers / clarifications for if possible: 3 types of replication in 9.1 I've read about from the offical docs: 1) warm standby 2) hot standby 3) streaming replication I'm using streaming replication I believe, the only indication I have is that there is the primary_conninfo on the standby. Is this the only indication? Is it possible to get if using streaming replication under normal operations? *cp: cannot stat `/opt/postgres/9.1/archive/0007000F0057': No such file or directory* *cp: cannot stat `/opt/postgres/9.1/archive/0007000F0057': No such file or directory* *LOG: streaming replication successfully connected to primary* *FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 0007000F0057 has already been removed* My understanding is that warm standby and hot standby do log shipping and there is a greater window for transactions not to be send to the standby because WAL XLOG must be filled. Whereas Streaming replication basically sends at the transaction level? I'm sure this is somewhat misinformed! Thanks, Will
Re: [GENERAL] - PostgreSQL Replication Types
On 12/17/2015 07:56 AM, Will McCormick wrote: Thanks a ton for the prompt response. I've read most of this but some it was not clear until we discussed. Updated with WLM: On 12/17/2015 07:17 AM, Will McCormick wrote: I inherited a 9.1 replication environment Few basic questions that I can't find clear answers / clarifications for if possible: 3 types of replication in 9.1 I've read about from the offical docs: 1) warm standby 2) hot standby 3) streaming replication I'm using streaming replication I believe, the only indication I have is that there is the primary_conninfo on the standby. Is this the only indication? WLM: I also see streaming replication in the logs. On standby: http://www.postgresql.org/docs/9.1/interactive/functions-admin.html " pg_last_xlog_receive_location() Get last transaction log location received and synced to disk by streaming replication. While streaming replication is in progress this will increase monotonically. If recovery has completed this will remain static at the value of the last WAL record received and synced to disk during recovery. If streaming replication is disabled, or if it has not yet started, the function returns NULL." WLM: When I do this on the standby I get an error: ERROR: recovery is in progress HINT: WAL control functions cannot be ... What is the full hint message? The functions are supposed to be able to be run while the server is in recovery. Is it possible to get if using streaming replication under normal operations? /cp: cannot stat `/opt/postgres/9.1/archive/0007000F0057': No such file or directory/ /cp: cannot stat `/opt/postgres/9.1/archive/0007000F0057': No such file or directory/ /LOG: streaming replication successfully connected to primary/ /FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 0007000F0057 has already been removed/ Assuming above is from standby log, correct? WLM: yes The cp lines would seem to indicate a restore_command in the standby recovery.conf, is that the case?: http://www.postgresql.org/docs/9.1/static/archive-recovery-settings.html restore_command (string) WLM: Correct The FATAL indicates that the WAL file has already been recycled on the master. WLM: I had read this what confuses me a bit is: /cp: cannot stat `/opt/postgres/9.1/archive/0007000F0057': No such file or directory/ Does Streaming replication automatically use Archived WAL files when WAL XLOG files don't contain a transaction? We did have wal_keep_segments set to 0. I changed this to 50 but want to better understand this. Especially the correlation between the Archived WALs and the XLOG WALs. My guess is the difference between Streaming replication and the others is very simply that Streaming replication can read the XLOG WALs as well? So if all the Archived WALs have been shipped and processed to the Standby then the XLOGs are processed but not shipped? This meaning at a transaction level "kindof"? Not really see the section below(#STREAMING-REPLICATION) I posted previously. It is either or, if streaming is set up and the standby can reach the master xlog directory then it will stream the WAL files from there. If the standby cannot access the xlog directory and if you have WAL archiving set up on the master and archive restore setup on the standby it will switch to full WAL log shipping from the archived WAL directory, assuming the files it needs are there. See: http://www.postgresql.org/docs/9.1/interactive/runtime-config-replication.html#RUNTIME-CONFIG-REPLICATION-MASTER "wal_keep_segments (integer) Specifies the minimum number of past log file segments kept in the pg_xlog directory, in case a standby server needs to fetch them for streaming replication. Each segment is normally 16 megabytes. If a standby server connected to the primary falls behind by more than wal_keep_segments segments, the primary might remove a WAL segment still needed by the standby, in which case the replication connection will be terminated. (However, the standby server can recover by fetching the segment from archive, if WAL archiving is in use.) This sets only the minimum number of segments retained in pg_xlog; the system might need to retain more segments for WAL archival or to recover from a checkpoint. If wal_keep_segments is zero (the default), the system doesn't keep any extra segments for standby purposes, so the number of old WAL segments available to standby servers is a function of the location of the previous checkpoint and status of WAL archiving. This parameter has no effect on restartpoints. This parameter can only be set in the postgresql.conf file or on the server command line. " http://www.postgresql.org/docs/9.1/interactive/warm-standby.html#STREAMING-REPLICATION "If you use streaming replication without file-based
Re: [GENERAL] - PostgreSQL Replication Types
Almost forgot this: SELECT pg_current_xlog_location(); ERROR: recovery is in progress HINT: WAL control functions cannot be executed during recovery. bms=> SELECT pg_current_xlog_location(); ERROR: recovery is in progress HINT: WAL control functions cannot be executed during recovery. On Thu, Dec 17, 2015 at 11:27 AM, Adrian Klaverwrote: > On 12/17/2015 07:56 AM, Will McCormick wrote: > >> Thanks a ton for the prompt response. >> >> I've read most of this but some it was not clear until we discussed. >> >> Updated with WLM: >> >> On 12/17/2015 07:17 AM, Will McCormick wrote: >> >> I inherited a 9.1 replication environment >> >> Few basic questions that I can't find clear answers / clarifications >> for >> if possible: >> >> 3 types of replication in 9.1 I've read about from the offical docs: >> >> 1) warm standby >> 2) hot standby >> 3) streaming replication >> >> I'm using streaming replication I believe, the only indication I have >> is that there is the primary_conninfo on the standby. Is this the >> only >> indication? >> >> >> WLM: I also see streaming replication in the logs. >> >> On standby: >> >> http://www.postgresql.org/docs/9.1/interactive/functions-admin.html >> " >> pg_last_xlog_receive_location() >> >> Get last transaction log location received and synced to disk by >> streaming replication. While streaming replication is in progress this >> will increase monotonically. If recovery has completed this will remain >> static at the value of the last WAL record received and synced to disk >> during recovery. If streaming replication is disabled, or if it has not >> yet started, the function returns NULL." >> >> WLM: When I do this on the standby I get an error: >> >>ERROR: recovery is in progress >> >>HINT: WAL control functions cannot be ... >> > > What is the full hint message? > The functions are supposed to be able to be run while the server is in > recovery. > > > >> >> >> Is it possible to get if using streaming replication under normal >> operations? >> >> /cp: cannot stat `/opt/postgres/9.1/archive/0007000F0057': >> No such file or directory/ >> >> /cp: cannot stat `/opt/postgres/9.1/archive/0007000F0057': >> No such file or directory/ >> >> /LOG: streaming replication successfully connected to primary/ >> >> /FATAL: could not receive data from WAL stream: FATAL: requested WAL >> segment 0007000F0057 has already been removed/ >> >> >> Assuming above is from standby log, correct? WLM: yes >> >> The cp lines would seem to indicate a restore_command in the standby >> recovery.conf, is that the case?: >> >> http://www.postgresql.org/docs/9.1/static/archive-recovery-settings.html >> >> restore_command (string) WLM: Correct >> >> >> The FATAL indicates that the WAL file has already been recycled on the >> master. >> >> WLM: I had read this what confuses me a bit is: >> >>/cp: cannot stat `/opt/postgres/9.1/archive/0007000F0057': >>No such file or directory/ >> >> Does Streaming replication automatically use Archived WAL files when >> WAL XLOG files don't contain a transaction? >> >> We did have wal_keep_segments set to 0. I changed this to 50 but >> want to better understand this. Especially the correlation between the >> Archived WALs and the XLOG WALs. My guess is the difference between >> Streaming replication and the others is very simply that Streaming >> replication can read the XLOG WALs as well? So if all the Archived WALs >> have been shipped and processed to the Standby then the XLOGs are >> processed but not shipped? This meaning at a transaction level "kindof"? >> > > Not really see the section below(#STREAMING-REPLICATION) I posted > previously. It is either or, if streaming is set up and the standby can > reach the master xlog directory then it will stream the WAL files from > there. If the standby cannot access the xlog directory and if you have WAL > archiving set up on the master and archive restore setup on the standby it > will switch to full WAL log shipping from the archived WAL directory, > assuming the files it needs are there. > > > >> See: >> >> >> http://www.postgresql.org/docs/9.1/interactive/runtime-config-replication.html#RUNTIME-CONFIG-REPLICATION-MASTER >> >> "wal_keep_segments (integer) >> >> Specifies the minimum number of past log file segments kept in the >> pg_xlog directory, in case a standby server needs to fetch them for >> streaming replication. Each segment is normally 16 megabytes. If a >> standby server connected to the primary falls behind by more than >> wal_keep_segments segments, the primary might remove a WAL segment still >> needed by the standby, in which case the replication connection will be >> terminated. (However, the standby server can recover by fetching the >> segment from archive, if WAL archiving is in use.) >> >> This sets only
Re: [GENERAL] - PostgreSQL Replication Types
On 12/17/2015 07:17 AM, Will McCormick wrote: I inherited a 9.1 replication environment Few basic questions that I can't find clear answers / clarifications for if possible: 3 types of replication in 9.1 I've read about from the offical docs: 1) warm standby 2) hot standby 3) streaming replication I'm using streaming replication I believe, the only indication I have is that there is the primary_conninfo on the standby. Is this the only indication? On standby: http://www.postgresql.org/docs/9.1/interactive/functions-admin.html " pg_last_xlog_receive_location() Get last transaction log location received and synced to disk by streaming replication. While streaming replication is in progress this will increase monotonically. If recovery has completed this will remain static at the value of the last WAL record received and synced to disk during recovery. If streaming replication is disabled, or if it has not yet started, the function returns NULL." Is it possible to get if using streaming replication under normal operations? /cp: cannot stat `/opt/postgres/9.1/archive/0007000F0057': No such file or directory/ /cp: cannot stat `/opt/postgres/9.1/archive/0007000F0057': No such file or directory/ /LOG: streaming replication successfully connected to primary/ /FATAL: could not receive data from WAL stream: FATAL: requested WAL segment 0007000F0057 has already been removed/ Assuming above is from standby log, correct? The cp lines would seem to indicate a restore_command in the standby recovery.conf, is that the case?: http://www.postgresql.org/docs/9.1/static/archive-recovery-settings.html restore_command (string) The FATAL indicates that the WAL file has already been recycled on the master. See: http://www.postgresql.org/docs/9.1/interactive/runtime-config-replication.html#RUNTIME-CONFIG-REPLICATION-MASTER "wal_keep_segments (integer) Specifies the minimum number of past log file segments kept in the pg_xlog directory, in case a standby server needs to fetch them for streaming replication. Each segment is normally 16 megabytes. If a standby server connected to the primary falls behind by more than wal_keep_segments segments, the primary might remove a WAL segment still needed by the standby, in which case the replication connection will be terminated. (However, the standby server can recover by fetching the segment from archive, if WAL archiving is in use.) This sets only the minimum number of segments retained in pg_xlog; the system might need to retain more segments for WAL archival or to recover from a checkpoint. If wal_keep_segments is zero (the default), the system doesn't keep any extra segments for standby purposes, so the number of old WAL segments available to standby servers is a function of the location of the previous checkpoint and status of WAL archiving. This parameter has no effect on restartpoints. This parameter can only be set in the postgresql.conf file or on the server command line. " http://www.postgresql.org/docs/9.1/interactive/warm-standby.html#STREAMING-REPLICATION "If you use streaming replication without file-based continuous archiving, you have to set wal_keep_segments in the master to a value high enough to ensure that old WAL segments are not recycled too early, while the standby might still need them to catch up. If the standby falls behind too much, it needs to be reinitialized from a new base backup. If you set up a WAL archive that's accessible from the standby, wal_keep_segments is not required as the standby can always use the archive to catch up." / / My understanding is that warm standby and hot standby do log shipping and there is a greater window for transactions not to be send to the standby because WAL XLOG must be filled. Hot versus warm standby refer to whether it is possible to run read only queries on the standby in the first case or not in the second case. Whereas Streaming replication basically sends at the transaction level? The difference you are looking for is log shipping versus streaming, where log shipping moves complete WAL files and streaming streams the same files. See here for more detail: http://www.postgresql.org/docs/9.1/interactive/high-availability.html I'm sure this is somewhat misinformed! Thanks, Will -- Adrian Klaver adrian.kla...@aklaver.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] - PostgreSQL Replication Types
Thanks a ton for the prompt response. I've read most of this but some it was not clear until we discussed. Updated with WLM: On 12/17/2015 07:17 AM, Will McCormick wrote: > > I inherited a 9.1 replication environment > > Few basic questions that I can't find clear answers / clarifications for > if possible: > > 3 types of replication in 9.1 I've read about from the offical docs: > > 1) warm standby > 2) hot standby > 3) streaming replication > > I'm using streaming replication I believe, the only indication I have > is that there is the primary_conninfo on the standby. Is this the only > indication? > WLM: I also see streaming replication in the logs. On standby: http://www.postgresql.org/docs/9.1/interactive/functions-admin.html " pg_last_xlog_receive_location() Get last transaction log location received and synced to disk by streaming replication. While streaming replication is in progress this will increase monotonically. If recovery has completed this will remain static at the value of the last WAL record received and synced to disk during recovery. If streaming replication is disabled, or if it has not yet started, the function returns NULL." WLM: When I do this on the standby I get an error: ERROR: recovery is in progress HINT: WAL control functions cannot be ... > Is it possible to get if using streaming replication under normal > operations? > > /cp: cannot stat `/opt/postgres/9.1/archive/0007000F0057': > No such file or directory/ > > /cp: cannot stat `/opt/postgres/9.1/archive/0007000F0057': > No such file or directory/ > > /LOG: streaming replication successfully connected to primary/ > > /FATAL: could not receive data from WAL stream: FATAL: requested WAL > segment 0007000F0057 has already been removed/ > Assuming above is from standby log, correct? WLM: yes The cp lines would seem to indicate a restore_command in the standby recovery.conf, is that the case?: http://www.postgresql.org/docs/9.1/static/archive-recovery-settings.html restore_command (string) WLM: Correct The FATAL indicates that the WAL file has already been recycled on the master. WLM: I had read this what confuses me a bit is: /cp: cannot stat `/opt/postgres/9.1/archive/0007000F0057': No such file or directory/ Does Streaming replication automatically use Archived WAL files when WAL XLOG files don't contain a transaction? We did have wal_keep_segments set to 0. I changed this to 50 but want to better understand this. Especially the correlation between the Archived WALs and the XLOG WALs. My guess is the difference between Streaming replication and the others is very simply that Streaming replication can read the XLOG WALs as well? So if all the Archived WALs have been shipped and processed to the Standby then the XLOGs are processed but not shipped? This meaning at a transaction level "kindof"? See: http://www.postgresql.org/docs/9.1/interactive/runtime-config-replication.html#RUNTIME-CONFIG-REPLICATION-MASTER "wal_keep_segments (integer) Specifies the minimum number of past log file segments kept in the pg_xlog directory, in case a standby server needs to fetch them for streaming replication. Each segment is normally 16 megabytes. If a standby server connected to the primary falls behind by more than wal_keep_segments segments, the primary might remove a WAL segment still needed by the standby, in which case the replication connection will be terminated. (However, the standby server can recover by fetching the segment from archive, if WAL archiving is in use.) This sets only the minimum number of segments retained in pg_xlog; the system might need to retain more segments for WAL archival or to recover from a checkpoint. If wal_keep_segments is zero (the default), the system doesn't keep any extra segments for standby purposes, so the number of old WAL segments available to standby servers is a function of the location of the previous checkpoint and status of WAL archiving. This parameter has no effect on restartpoints. This parameter can only be set in the postgresql.conf file or on the server command line. " http://www.postgresql.org/docs/9.1/interactive/warm-standby.html#STREAMING-REPLICATION "If you use streaming replication without file-based continuous archiving, you have to set wal_keep_segments in the master to a value high enough to ensure that old WAL segments are not recycled too early, while the standby might still need them to catch up. If the standby falls behind too much, it needs to be reinitialized from a new base backup. If you set up a WAL archive that's accessible from the standby, wal_keep_segments is not required as the standby can always use the archive to catch up." > / > / > > My understanding is that warm standby and hot standby do log shipping > and there is a greater window for transactions not to be send to the > standby because WAL XLOG must be filled. > Hot versus warm standby refer to
Re: [GENERAL] - PostgreSQL Replication Types
On 12/17/2015 08:43 AM, Will McCormick wrote: Almost forgot this: SELECT pg_current_xlog_location(); I was not paying attention earlier that should be: pg_last_xlog_receive_location() from: http://www.postgresql.org/docs/9.1/interactive/functions-admin.html Table 9-58. Recovery Information Functions ERROR: recovery is in progress HINT: WAL control functions cannot be executed during recovery. bms=> SELECT pg_current_xlog_location(); ERROR: recovery is in progress HINT: WAL control functions cannot be executed during recovery. -- Adrian Klaver adrian.kla...@aklaver.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Postgresql replication assistance
Gustav Potgieter wrote: Hope you can assist and that I am posting to the right forum. Sending the question twice with the same wording does not make it clearer... We currently have multiple Postgresql 9 instances running with warm standby, and the replication work wonderfully. The problem is the following, we take the slave database out of recovery and it works perfectly, but when we create a new database and setup streaming replication to point to this database it is read only and never starts up. I created a test database and it did replicate across, but I could only see it once I removed recovery.conf and re-started obviously not in recovery anymore, I attached the logs at the bottom FYI Is this even possible, to re-create a streaming database again from an ex slave database with a base backup and simply re-creating recovery.conf before startup? If I understand you correctly, there are three computers involved: An original master, an original slave that is taken out of recovery (promoted?), and a new slave that is set up using the original slave as master. Right? That should work if you do it right. You quote from a number of logs, but to me it is not clear which log belongs to which server. Let's say the original master is A, the original slave and new master is B and the new slave is C. It would be interesting to see B's log from when replication is set up and starts working, B's log from the time it is promoted to master, and C's log from the time when you try to set it up as B's slave. Do you share WAL archives between the machines? If yes, how? How exactly did you promote B? Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Postgresql replication assistance
Hi All, Hope you can assist and that I am posting to the right forum. We currently have multiple Postgresql 9 instances running with warm standby, and the replication work wonderfully. The problem is the following, we take the slave database out of recovery and it works perfectly, but when we create a new database and setup streaming replication to point to this database it is read only and never starts up. I created a test database and it did replicate across, but I could only see it once I removed recovery.conf and re-started obviously not in recovery anymore, I attached the logs at the bottom FYI Is this even possible, to re-create a streaming database again from an ex slave database with a base backup and simply re-creating recovery.conf before startup? Thank you, Gustav ## LOGS ON SLAVE LOG: entering standby mode LOG: record with zero length at 1D/3C000634 LOG: streaming replication successfully connected to primary LOG: redo starts at 1D/3C000634 FATAL: the database system is starting up ## recovery.conf standby_mode = 'on' primary_conninfo = 'host=emr-db-1 port=5432 user=postgres password=postgres' trigger_file = '/tmp/pgsql.trigger' ## and postgresql.conf # - Standby Servers - hot_standby = on# on allows queries during recovery # (change requires restart) ## Solaris 10# postmaster --version postgres (PostgreSQL) 9.0.9 ## Solaris 10# uname -a SunOS database-machine-1 4 5.10 Generic_142910-17 i86pc i386 i86pc # The database did not recover when we created a trigger file and I had to manually remove recovery.conf and restart the database before it started up and I was able to live queries again, obviously not in recovery anymore. LOG: trigger file found: /tmp/pgsql.trigger FATAL: terminating walreceiver process due to administrator command LOG: redo done at 1D/47002BC8 LOG: last completed transaction was at log time 2012-09-10 18:33:04.734928+02 FATAL: WAL ends before consistent recovery point LOG: startup process (PID 28326) exited with exit code 1 LOG: terminating any other active server processes LOG: database system was interrupted while in recovery at log time 2012-09-10 18:32:21 SAST HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. LOG: entering standby mode LOG: redo starts at 1D/4520 LOG: invalid record length at 1D/47002C04 LOG: streaming replication successfully connected to primary FATAL: the database system is starting up
[GENERAL] Postgresql replication assistance
Hi All, Hope you can assist and that I am posting to the right forum. We currently have multiple Postgresql 9 instances running with warm standby, and the replication work wonderfully. The problem is the following, we take the slave database out of recovery and it works perfectly, but when we create a new database and setup streaming replication to point to this database it is read only and never starts up. I created a test database and it did replicate across, but I could only see it once I removed recovery.conf and re-started obviously not in recovery anymore, I attached the logs at the bottom FYI Is this even possible, to re-create a streaming database again from an ex slave database with a base backup and simply re-creating recovery.conf before startup? Thank you, Gustav ## LOGS ON SLAVE LOG: entering standby mode LOG: record with zero length at 1D/3C000634 LOG: streaming replication successfully connected to primary LOG: redo starts at 1D/3C000634 FATAL: the database system is starting up ## recovery.conf standby_mode = 'on' primary_conninfo = 'host=emr-db-1 port=5432 user=postgres password=postgres' trigger_file = '/tmp/pgsql.trigger' ## and postgresql.conf # - Standby Servers - hot_standby = on# on allows queries during recovery # (change requires restart) ## Solaris 10# postmaster --version postgres (PostgreSQL) 9.0.9 ## Solaris 10# uname -a SunOS database-machine-1 4 5.10 Generic_142910-17 i86pc i386 i86pc # The database did not recover when we created a trigger file and I had to manually remove recovery.conf and restart the database before it started up and I was able to live queries again, obviously not in recovery anymore. LOG: trigger file found: /tmp/pgsql.trigger FATAL: terminating walreceiver process due to administrator command LOG: redo done at 1D/47002BC8 LOG: last completed transaction was at log time 2012-09-10 18:33:04.734928+02 FATAL: WAL ends before consistent recovery point LOG: startup process (PID 28326) exited with exit code 1 LOG: terminating any other active server processes LOG: database system was interrupted while in recovery at log time 2012-09-10 18:32:21 SAST HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. LOG: entering standby mode LOG: redo starts at 1D/4520 LOG: invalid record length at 1D/47002C04 LOG: streaming replication successfully connected to primary FATAL: the database system is starting up
[GENERAL] Postgresql Replication Comparison Required
I am doing POC on Posgtresql replication. I am using latest version of postgresql i.e. 9.1. There are multiple replication solutions avaliable in the market (PGCluster, Pgpool-II, Slony-I). Postgresql also provide in-built replication solutions (Streaming replication, Warm Standby and hot standby). I am confused which solution is best for the financial application for which I am doing POC. The application will write around 160 million records with row size of 2.5 KB in database. My questions is for following scenarios which replication solution will be suitable: If I would require replication for backup purpose only If I would require to scale the reads If I would require High Avaliability and Consistency Also It will be very helpful if you can share the perfomance or experience with postgresql replication solutions. Thanks
Re: [GENERAL] Postgresql Replication Comparison Required
Hello, in general my advice would be to stick with native features, therefore use either Streaming Replication (or alternatively log shipping replication). You might need some tools to help you manage the cluster, clients routing and balancing but I suggest you look into this later. On Thu, 29 Dec 2011 16:55:00 +0530, saurabh gupta saurabh@gmail.com wrote: If I would require replication for backup purpose only For disaster recovery, you need physical base backups with continous archiving (http://www.postgresql.org/docs/9.1/interactive/continuous-archiving.html) If I would require to scale the reads You need Hot Standby here (and you might need to route read only applications to one of the slaves somehow - but you can worry about this later). If I would require High Avaliability and Consistency Streaming replication. With 9.1 you also have Synchronous Streaming Replication which means you have zero data loss of committed transactions within your PostgreSQL cluster. Another useful tool you might want to look into is repmgr (www.repmgr.org). Also It will be very helpful if you can share the perfomance or experience with postgresql replication solutions. I wish I could help you more here, but most of our professional work is performed under strict NDAs. An interesting and useful documentation section is this also: http://www.postgresql.org/docs/9.1/interactive/high-availability.html I hope this helps. Cheers, Gabriele -- Gabriele Bartolini - 2ndQuadrant Italia PostgreSQL Training, Services and Support gabriele.bartol...@2ndquadrant.it - www.2ndQuadrant.it -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PostgreSQL Replication with read-only access to standby DB
On Mar 25, 2008, at 4:28 PM, Jeff Davis wrote: This obviously does not work in real time, but it may be useful. It does not require a lot of additional space to do this because of the ZFS copy-on-write implementation. But what benefit does it give you if you're pounding on the same set of physical disks? You might as well run it on the original since you're limit is the disk I/O. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PostgreSQL Replication with read-only access to standby DB
On Tue, 2008-03-25 at 17:53 -0600, Keaton Adams wrote: That is an interesting question. If our organization were to help fund the development of such a feature, would that be something taken into consideration by the development team? Yes. Many of my major projects have been funded that way. I have funding now for some projects, but not this one. Most of my smaller contributions and advocacy work are not made for direct reward. I'm trying to get in touch with Florian to discuss working together on this. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] PostgreSQL Replication with read-only
Our organization is looking for a hot-standby option for PostgreSQL that uses the WAL (transaction) data to keep the standby current and also allows the standby to be read-only accessible for reporting. We have implemented WAL shipping through a set of scripts we developed and that works well to have a standby DB on the ready in case we need to fail over, but we are looking to increase the value of the standby server by making it available for queries. Because of the complexities of our environment using a table/trigger based replication method such as Slony won¹t work well. It would be great if there was a solution (Open Source or Commercial) that worked in a similar manner as Oracle Active Data Guard: ³Oracle Active Data Guard enables a physical standby database to be open for read-only access for reporting, simple or complex queries while changes from the production database are being applied to it. This means any operation that requires up-to-date read-only access can be offloaded to the replica, enhancing and protecting the performance of the production database.² ³All queries reading from the physical replica execute in real-time, and return current results. A Data Guard configuration consists of one production (or primary) database and up to nine standby databases. A standby database is initially created from a backup copy of the primary database. Once created, Data Guard automatically maintains the standby database as a synchronized copy of the primary database by transmitting primary database redo data to the standby system and then applying the redo data to the standby database.² Does anyone know of such a solution for PostgreSQL? Thanks, Keaton
Re: [GENERAL] PostgreSQL Replication with read-only access to standby DB
Le mercredi 26 mars 2008, Greg Smith a écrit : (My favorite acronym is TLA) Hehe :) I'd vote for A... -- dim signature.asc Description: This is a digitally signed message part.
Re: [GENERAL] PostgreSQL Replication with read-only access to standby DB
On Wed, Mar 26, 2008 at 01:03:34AM -0400, Greg Smith wrote: against. People who are using the current warm-standby code are already grappling with issues like how to coordinate master/slave failover (including my second favorite acronym, STONITH for shoot the other node in the head). I don't expect handling that sort of thing will ever be integrated into the PostgreSQL database core. Note that most other database products don't integrate it in their core either. They package separate tools for it, and sell it as a single system, but you can often buy the separate tools independently. Oracle's RAC is an exception, but it also works completely differently than any of this. A -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PostgreSQL Replication with read-only access to standby DB
[EMAIL PROTECTED] (Keaton Adams) writes: That is an interesting question. If our organization were to help fund the development of such a feature, would that be something taken into consideration by the development team? I seem to recall there being a relevant Google Summer of Code project about this, last year. http://code.google.com/soc/2007/postgres/appinfo.html?csaid=6545828A8197EBC6 http://archives.postgresql.org/pgsql-hackers/2007-03/msg00050.php I do not recall how far it got. It obviously didn't make it into 8.3 ;-)! -- (format nil [EMAIL PROTECTED] cbbrowne acm.org) http://cbbrowne.com/info/rdbms.html Why are they called apartments, when they're all stuck together? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PostgreSQL Replication with read-only access to standby DB
Chris Browne wrote: I seem to recall there being a relevant Google Summer of Code project about this, last year. I do not recall how far it got. It obviously didn't make it into 8.3 ;-)! Some parts of it did -- for example we got read-only transactions which were a step towards that goal. (The point here is that a hot standby needs to be able to execute readonly transactions.) -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] PostgreSQL Replication with read-only access to standby DB
Our organization is looking for a hot-standby option for PostgreSQL that uses the WAL (transaction) data to keep the standby current and also allows the standby to be read-only accessible for reporting. We have implemented WAL shipping through a set of scripts we developed and that works well to have a standby DB on the ready in case we need to fail over, but we are looking to increase the value of the standby server by making it available for queries. Because of the complexities of our environment using a table/trigger based replication method such as Slony won¹t work well. It would be great if there was a solution (Open Source or Commercial) that worked in a similar manner as Oracle Active Data Guard: ³Oracle Active Data Guard enables a physical standby database to be open for read-only access for reporting, simple or complex queries while changes from the production database are being applied to it. This means any operation that requires up-to-date read-only access can be offloaded to the replica, enhancing and protecting the performance of the production database.² ³All queries reading from the physical replica execute in real-time, and return current results. A Data Guard configuration consists of one production (or primary) database and up to nine standby databases. A standby database is initially created from a backup copy of the primary database. Once created, Data Guard automatically maintains the standby database as a synchronized copy of the primary database by transmitting primary database redo data to the standby system and then applying the redo data to the standby database.² Does anyone know of such a solution for PostgreSQL? Thanks, Keaton
Re: [GENERAL] PostgreSQL Replication with read-only access to standby DB
Keaton Adams wrote: Our organization is looking for a hot-standby option for PostgreSQL that uses the WAL (transaction) data to keep the standby current and also allows the standby to be read-only accessible for reporting. We have implemented WAL shipping through a set of scripts we developed and that works well to have a standby DB on the ready in case we need to fail over, but we are looking to increase the value of the standby server by making it available for queries. Because of the complexities of our environment using a table/trigger based replication method such as Slony won¹t work well. It would be great if there was a solution (Open Source or Commercial) that worked in a similar manner as Oracle Active Data Guard: ³Oracle Active Data Guard enables a physical standby database to be open for read-only access for reporting, simple or complex queries while changes from the production database are being applied to it. This means any operation that requires up-to-date read-only access can be offloaded to the replica, enhancing and protecting the performance of the production database.² ³All queries reading from the physical replica execute in real-time, and return current results. A Data Guard configuration consists of one production (or primary) database and up to nine standby databases. A standby database is initially created from a backup copy of the primary database. Once created, Data Guard automatically maintains the standby database as a synchronized copy of the primary database by transmitting primary database redo data to the standby system and then applying the redo data to the standby database.² Does anyone know of such a solution for PostgreSQL? Thanks, Keaton IIRC, it was mentioned previously in one posting that this a TODO for a future version of postgres but not something that's expected soon. Someone please correct me if I'm wrong. -salman -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PostgreSQL Replication with read-only access to standby DB
On Tue, Mar 25, 2008 at 1:17 PM, salman [EMAIL PROTECTED] wrote: IIRC, it was mentioned previously in one posting that this a TODO for a future version of postgres but not something that's expected soon. Someone please correct me if I'm wrong. This is what I saw on the TODO list: Write-Ahead Log * Allow WAL traffic to be streamed to another server for stand-by replication -- Regards, Richard Broersma Jr. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PostgreSQL Replication with read-only access to standby DB
But will that stand-by replication provide for a read-only slave? On 3/25/08 2:26 PM, Richard Broersma [EMAIL PROTECTED] wrote: On Tue, Mar 25, 2008 at 1:17 PM, salman [EMAIL PROTECTED] wrote: IIRC, it was mentioned previously in one posting that this a TODO for a future version of postgres but not something that's expected soon. Someone please correct me if I'm wrong. This is what I saw on the TODO list: Write-Ahead Log * Allow WAL traffic to be streamed to another server for stand-by replication -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PostgreSQL Replication with read-only access to standby DB
On Tue, 2008-03-25 at 14:11 -0600, Keaton Adams wrote: “All queries reading from the physical replica execute in real-time, and return current results. A Data Guard configuration consists of one production (or primary) database and up to nine standby databases. A standby database is initially created from a backup copy of the primary database. Once created, Data Guard automatically maintains the standby database as a synchronized copy of the primary database by transmitting primary database redo data to the standby system and then applying the redo data to the standby database.” Does anyone know of such a solution for PostgreSQL? There has been work in this area, but unfortunately this feature has not been completed yet. Many people would like this feature. I have heard of a strategy used by some PostgreSQL users in which they use something like ZFS (which allows fast filesystem snapshots/clones) to quickly clone the data in a separate area, and then bring up the database on the clone for reporting purposes. This obviously does not work in real time, but it may be useful. It does not require a lot of additional space to do this because of the ZFS copy-on-write implementation. Regards, Jeff Davis -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PostgreSQL Replication with read-only access to standby DB
It is close, but has limitations that will be problematic for our environment, such as: Replicator will not replicate the schema. You must restore your schema to th e slaves from the master before you begin replication. Replicator can only replicate one database. If you have multiple databases y ou can either initialize clusters for each database or move all databases into a si ngle database using schemas/namespaces. It is possible to add and drop columns to replicated tables within Replicato r. This type of change to your table structure will require a full sync and therefore is best done in batch or after hours. Thanks for the reply, Keaton On 3/25/08 2:18 PM, Richard Broersma [EMAIL PROTECTED] wrote: On Tue, Mar 25, 2008 at 1:11 PM, Keaton Adams [EMAIL PROTECTED] wrote: Our organization is looking for a hot-standby option for PostgreSQL that uses the WAL (transaction) data to keep the standby current and also allows the standby to be read-only accessible for reporting. We have implemented WAL shipping through a set of scripts we developed and that works well to have a standby DB on the ready in case we need to fail over, but we are looking to increase the value of the standby server by making it available for queries. Because of the complexities of our environment using a table/trigger based replication method such as Slony won't work well. It would be great if there was a solution (Open Source or Commercial) that worked in a similar manner as Oracle Active Data Guard: Does anyone know of such a solution for PostgreSQL? I think this does what you want. http://commandprompt.com/products/mammothreplicator/
Re: [GENERAL] PostgreSQL Replication with read-only access to standby DB
On Tue, 2008-03-25 at 14:11 -0600, Keaton Adams wrote: “Oracle Active Data Guard enables a physical standby database to be open for read-only access – for reporting, simple or complex queries – while changes from the production database are being applied to it. This means any operation that requires up-to-date read-only access can be offloaded to the replica, enhancing and protecting the performance of the production database.” “All queries reading from the physical replica execute in real-time, and return current results. A Data Guard configuration consists of one production (or primary) database and up to nine standby databases. A standby database is initially created from a backup copy of the primary database. Once created, Data Guard automatically maintains the standby database as a synchronized copy of the primary database by transmitting primary database redo data to the standby system and then applying the redo data to the standby database.” Does anyone know of such a solution for PostgreSQL? Some funding would help that move forwards. If you or others would consider that, it would help, even if just to provide the seed for additional contributors. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PostgreSQL Replication with read-only access to standby DB
On Tue, Mar 25, 2008 at 3:08 PM, Simon Riggs [EMAIL PROTECTED] wrote: On Tue, 2008-03-25 at 14:11 -0600, Keaton Adams wrote: Some funding would help that move forwards. If you or others would consider that, it would help, even if just to provide the seed for additional contributors. That is an interesting thought, is it possible to earmark donations for a specific (set of) todo(s)? -- Regards, Richard Broersma Jr. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PostgreSQL Replication with read-only access to standby DB
On Tue, Mar 25, 2008 at 1:11 PM, Keaton Adams [EMAIL PROTECTED] wrote: Our organization is looking for a hot-standby option for PostgreSQL that uses the WAL (transaction) data to keep the standby current and also allows the standby to be read-only accessible for reporting. We have implemented WAL shipping through a set of scripts we developed and that works well to have a standby DB on the ready in case we need to fail over, but we are looking to increase the value of the standby server by making it available for queries. Because of the complexities of our environment using a table/trigger based replication method such as Slony won't work well. It would be great if there was a solution (Open Source or Commercial) that worked in a similar manner as Oracle Active Data Guard: Does anyone know of such a solution for PostgreSQL? I think this does what you want. http://commandprompt.com/products/mammothreplicator/ -- Regards, Richard Broersma Jr. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PostgreSQL Replication with read-only access to standby DB
That is an interesting question. If our organization were to help fund the development of such a feature, would that be something taken into consideration by the development team? -Keaton On 3/25/08 4:32 PM, Richard Broersma [EMAIL PROTECTED] wrote: On Tue, Mar 25, 2008 at 3:08 PM, Simon Riggs [EMAIL PROTECTED] wrote: On Tue, 2008-03-25 at 14:11 -0600, Keaton Adams wrote: Some funding would help that move forwards. If you or others would consider that, it would help, even if just to provide the seed for additional contributors. That is an interesting thought, is it possible to earmark donations for a specific (set of) todo(s)? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PostgreSQL Replication with read-only access to standby DB
Similar case has been already happened. For example, I have propsed to implement WITH RECURSIVE clause and the work is supported by Sumitomo Electric Information Systems Co., Ltd. (http://www.sei-info.co.jp/) and SRA OSS, Inc. Japan (http://www.sraoss.co.jp). -- Tatsuo Ishii SRA OSS, Inc. Japan That is an interesting question. If our organization were to help fund the development of such a feature, would that be something taken into consideration by the development team? -Keaton On 3/25/08 4:32 PM, Richard Broersma [EMAIL PROTECTED] wrote: On Tue, Mar 25, 2008 at 3:08 PM, Simon Riggs [EMAIL PROTECTED] wrote: On Tue, 2008-03-25 at 14:11 -0600, Keaton Adams wrote: Some funding would help that move forwards. If you or others would consider that, it would help, even if just to provide the seed for additional contributors. That is an interesting thought, is it possible to earmark donations for a specific (set of) todo(s)? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PostgreSQL Replication with read-only access to standby DB
I'm in the same boat, looking for master-slave replication for 1 master 2 'standby' read-only servers (one would get promoted to master in case of failure). I recently read about WAL here: http://developer.postgresql.org/pgdocs/postgres/warm-standby.html The standby server is not available for access, since it is continually performing recovery processing. PostgreSQL does not provide the system software required to identify a failure on the primary and notify the standby system and then the standby database server. Many such tools exist and are well integrated with other aspects required for successful failover, such as IP address migration. In short there's not much automation magic at the moment and doesn't seem like what you're looking for. Pgpool-II might be the best alternative. I know very little about postgreSQL internals but it would be great if: - WAL files could be applied while the standby server is operational / allow read-only queries - Allow master server to send WAL files to standby servers / * WAL traffic to be streamed to another server - Allow master server to send list of all known standby servers - Allow standby server to check if master server is alive and promote itself as master (would need to ask / make sure other standby servers do not try promote themselves at the same time) Then in my ways, you can use a pool to query the read-only standby servers. As I was writing this out, I thought this would make a great SOC project, but then found it already exists! http://code.google.com/soc/2007/postgres/appinfo.html?csaid=6545828A8197EBC6 Great news, I'd be happy to pitch in any time to help design a solution like this :) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Keaton Adams Sent: March 25, 2008 4:29 PM To: Richard Broersma; salman Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] PostgreSQL Replication with read-only access to standby DB But will that stand-by replication provide for a read-only slave? On 3/25/08 2:26 PM, Richard Broersma [EMAIL PROTECTED] wrote: On Tue, Mar 25, 2008 at 1:17 PM, salman [EMAIL PROTECTED] wrote: IIRC, it was mentioned previously in one posting that this a TODO for a future version of postgres but not something that's expected soon. Someone please correct me if I'm wrong. This is what I saw on the TODO list: Write-Ahead Log * Allow WAL traffic to be streamed to another server for stand-by replication -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PostgreSQL Replication with read-only access to standby DB
On Tue, 25 Mar 2008, Jonathan Bond-Caron wrote: I know very little about postgreSQL internals but it would be great if: - WAL files could be applied while the standby server is operational / allow read-only queries This is the part that requires modifying PostgreSQL, and that progress was made toward by Florian's GSoC project last summer. - Allow master server to send WAL files to standby servers / * WAL traffic to be streamed to another server - Allow master server to send list of all known standby servers - Allow standby server to check if master server is alive and promote itself as master (would need to ask / make sure other standby servers do not try promote themselves at the same time) These parts you could build right now, except that there's not too much value to more than one standby if you're not using them to execute queries against. People who are using the current warm-standby code are already grappling with issues like how to coordinate master/slave failover (including my second favorite acronym, STONITH for shoot the other node in the head). I don't expect handling that sort of thing will ever be integrated into the PostgreSQL database core. What is happening instead is that the appropriate interfaces to allow building higher-level tools are being designed and made available. (My favorite acronym is TLA) -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PostGreSQL Replication
Sounds like something you'd want to handle within the application I believe i will try to follow this path. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] PostGreSQL Replication
On Sat, Jul 07, 2007 at 05:16:56AM -0700, Gabriele wrote: Let's have a server which feed data to multiple slaves, usually using direct online connections. Now, we may want to allow those client to sync the data to a local replica, work offline and then resync the data back to the server. Which is the easiest way to approach this problem? Write yourself a new system that does it? To my knowledge, nobody has yet built a production-ready system that can do this sort of multi-master merge-back replication. I can think of possible ways you could bodge this up using Slony-I and log shipping, but I don't think it'd be pretty. resource on this topic? I've found people talking about Slony-I, what can you tell me on this tool? Are there other tools? You can find out more about Slony-I at http://www.slony.info. Moreover, additionally to the previous question, let's take the case i do have multiple indipendent and separate server and then i want to upload the information of these server (actually not everything but a subset of the data) to another server (here we have a unidirectional comm, as opposed to previous bidirectional comm and we have asymmetric structure as opposed to the previous exact replica), what are the tools, documentation, best practices or other resources to approach this problem? This also sounds like a multimaster problem, and again I know of no system that currently supports what you're talking about. This is easier to bodge together under Slony using views and such like, but it's still not trivial. If you wish to discuss how to do this with Slony, I suggest taking it up on that list (available from the site mentioned above). A -- Andrew Sullivan | [EMAIL PROTECTED] A certain description of men are for getting out of debt, yet are against all taxes for raising money to pay it off. --Alexander Hamilton ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] PostGreSQL Replication
On 07.07.2007, at 06:16, Gabriele wrote: Let's have a server which feed data to multiple slaves, usually using direct online connections. Now, we may want to allow those client to sync the data to a local replica, work offline and then resync the data back to the server. Which is the easiest way to approach this problem? I don't know anything for PostgreSQL doing that. FrontBase [1] might have something like that if I understood Geert correctly. But I'm not sure about that. Maybe you ask them directly. There DBMS is free but not open source though. cug [1] http://www.frontbase.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] PostGreSQL Replication
On Saturday 07 July 2007 14.16:56 Gabriele wrote: I know this is a delicate topic which must be approached cautiously. Let's have a server which feed data to multiple slaves, usually using direct online connections. Now, we may want to allow those client to sync the data to a local replica, work offline and then resync the data back to the server. Which is the easiest way to approach this problem? Sounds like something you'd want to handle within the application, not at the database layer. The application can know much more about how data is modified and how it can be modified safely so a later merge operation won't fail, or how merge conflicts (however these are defined) can be safely handled. The database must assume that a user can modify anything during the offline period, and can't assume anything about what to do on merge operations. cheers -- vbi -- The days just prior to marriage are like a snappy introduction to a tedious book. signature.asc Description: This is a digitally signed message part.
[GENERAL] PostGreSQL Replication
I know this is a delicate topic which must be approached cautiously. Let's have a server which feed data to multiple slaves, usually using direct online connections. Now, we may want to allow those client to sync the data to a local replica, work offline and then resync the data back to the server. Which is the easiest way to approach this problem? Can you address me to tools, documentation, tech reference or other resource on this topic? I've found people talking about Slony-I, what can you tell me on this tool? Are there other tools? Moreover, additionally to the previous question, let's take the case i do have multiple indipendent and separate server and then i want to upload the information of these server (actually not everything but a subset of the data) to another server (here we have a unidirectional comm, as opposed to previous bidirectional comm and we have asymmetric structure as opposed to the previous exact replica), what are the tools, documentation, best practices or other resources to approach this problem? Thank you! ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] PostgreSql replication and load balancing ( is Slony-I a solution?)
On 5/2/07, Jamie Deppeler [EMAIL PROTECTED] wrote: You might want to check pgcluster out http://pgcluster.projects.postgresql.org/ witch does both. That page will give you the impression that this project is dead and abandoned -- the last update is from early 2005. PGCluster does seem to be active on PgFoundry: http://pgfoundry.org/projects/pgcluster/ Mind you, PGCluster is synchronous replication. For updates it scales even worse than Slony, since every update has to wait for all masters to complete. Alexander. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] PostgreSql replication and load balancing ( is Slony-I a solution?)
Alexander Staubo wrote: On 5/2/07, Jamie Deppeler [EMAIL PROTECTED] wrote: You might want to check pgcluster out http://pgcluster.projects.postgresql.org/ witch does both. That page will give you the impression that this project is dead and abandoned -- the last update is from early 2005. PGCluster does seem to be active on PgFoundry: http://pgfoundry.org/projects/pgcluster/ Huh, that page (the one Jamie linked to) _is_ pgfoundry's project page. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] PostgreSql replication and load balancing ( is Slony-I a solution?)
On 5/2/07, Alvaro Herrera [EMAIL PROTECTED] wrote: Alexander Staubo wrote: On 5/2/07, Jamie Deppeler [EMAIL PROTECTED] wrote: You might want to check pgcluster out http://pgcluster.projects.postgresql.org/ witch does both. That page will give you the impression that this project is dead and abandoned -- the last update is from early 2005. PGCluster does seem to be active on PgFoundry: http://pgfoundry.org/projects/pgcluster/ Huh, that page (the one Jamie linked to) _is_ pgfoundry's project page. I don't know what's going on here, but: $ curl -s http://pgcluster.projects.postgresql.org/ | md5 dad4aaf6659f1a65f228cee1ec71eba4 $ curl -s http://pgfoundry.org/projects/pgcluster/ | md5 3c6645ab3bbffa4e9f77a1ccab6a663a They're different pages. The first one is horribly out of date; unfortunately, it is (for me) the first hit on Google, whereas the PgFoundry project page is the third. Alexander. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] PostgreSql replication and load balancing ( is Slony-I a solution?)
Alexander Staubo wrote: On 5/2/07, Alvaro Herrera [EMAIL PROTECTED] wrote: Alexander Staubo wrote: On 5/2/07, Jamie Deppeler [EMAIL PROTECTED] wrote: You might want to check pgcluster out http://pgcluster.projects.postgresql.org/ witch does both. That page will give you the impression that this project is dead and abandoned -- the last update is from early 2005. PGCluster does seem to be active on PgFoundry: http://pgfoundry.org/projects/pgcluster/ Huh, that page (the one Jamie linked to) _is_ pgfoundry's project page. I don't know what's going on here, but: $ curl -s http://pgcluster.projects.postgresql.org/ | md5 dad4aaf6659f1a65f228cee1ec71eba4 $ curl -s http://pgfoundry.org/projects/pgcluster/ | md5 3c6645ab3bbffa4e9f77a1ccab6a663a They're different pages. The first one is horribly out of date; unfortunately, it is (for me) the first hit on Google, whereas the PgFoundry project page is the third. Sure, they are different pages, but the first one is supposed to be maintained by the pgcluster guys. It is linked from the second page, and it is hosted in pgfoundry. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] PostgreSql replication and load balancing ( is Slony-I a solution?)
On 5/2/07, Alvaro Herrera [EMAIL PROTECTED] wrote: They're different pages. The first one is horribly out of date; unfortunately, it is (for me) the first hit on Google, whereas the PgFoundry project page is the third. Sure, they are different pages, but the first one is supposed to be maintained by the pgcluster guys. It is linked from the second page, and it is hosted in pgfoundry. My entire point was that the first one *isn't* maintained. So avoid it if you're looking for up-to-date information about this project. Alexander. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] PostgreSql replication and load balancing ( is Slony-I a solution?)
Alexander Staubo wrote: On 5/2/07, Alvaro Herrera [EMAIL PROTECTED] wrote: They're different pages. The first one is horribly out of date; unfortunately, it is (for me) the first hit on Google, whereas the PgFoundry project page is the third. Sure, they are different pages, but the first one is supposed to be maintained by the pgcluster guys. It is linked from the second page, and it is hosted in pgfoundry. My entire point was that the first one *isn't* maintained. So avoid it if you're looking for up-to-date information about this project. My entire point was that that page is also on pgfoundry :-) -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[GENERAL] PostgreSql replication and load balancing ( is Slony-I a solution?)
Hello, We're building database system with replication. Slony-I seems to be a quite good solution for the replication, but beside the replication (master-to-multiple slaves), we need load balancing aswell - multiple users will access the database at the same time=multiple queries. Is Slony-I capable of load balancing and how to set it up? We searched the web and some people mentioned that Slony-I could do load balancing, but haven't found how to make Slony-I to do it. Simple applications like pgpool could do load balancing, too, but they seem to be too simple to be used as replication system. Thanks for any help. J. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] PostgreSql replication and load balancing ( is Slony-I a solution?)
On 5/1/07, Jan Bilek [EMAIL PROTECTED] wrote: Is Slony-I capable of load balancing and how to set it up? We searched the web and some people mentioned that Slony-I could do load balancing, but haven't found how to make Slony-I to do it. Slony does not do load balancing. Personally, I recommend doing the load balancing in the application if possible; you will need to funnel updates to the master and distribute queries to the slaves, and the app is in the best position to determine whether something is an update or a query. Any intermediate middleware is going to add additional latency and overhead and will add just another single point of failure to an already complex system. Alexander. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] PostgreSql replication and load balancing ( is Slony-I a solution?)
On 5/1/07, Jan Bilek [EMAIL PROTECTED] wrote: Is Slony-I capable of load balancing and how to set it up? We searched the web and some people mentioned that Slony-I could do load balancing, but haven't found how to make Slony-I to do it. Slony does not do load balancing. Personally, I recommend doing the load balancing in the application if possible; you will need to funnel updates to the master and distribute queries to the slaves, and the app is in the best position to determine whether something is an update or a query. Any intermediate middleware is going to add additional latency and overhead and will add just another single point of failure to an already complex system. Alexander. Thanks for info. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] PostgreSql replication and load balancing ( is Slony-I a solution?)
Alexander Staubo wrote: On 5/1/07, Jan Bilek [EMAIL PROTECTED] wrote: Is Slony-I capable of load balancing and how to set it up? We searched the web and some people mentioned that Slony-I could do load balancing, but haven't found how to make Slony-I to do it. You might want to check pgcluster out http://pgcluster.projects.postgresql.org/ witch does both. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Postgresql Replication
Nareen S wrote: Current Setup I am having a Mailserver running on Postfix.For the same I had configured a High Availability Solution using Heartbeat and DRBD for replication.Thesetup is like Primary/Secondary Node Servers and all mails are replicated to secondary using DRBD.If primary server fails secondary will be up with mails .. The total number of users is 600 users,and Hardware used is Two Compaq servers with SCSI HDD,1 GB RAM each, Network-100Mbps New Planned Setup Now I am planning for Postgresql server in the same setup.I want to know will it work in the same setup.ie if I configure a new device in drbd can I get HA for postgresql also.If I use DRBD for postgresql also ,will it affects the existing setups.Finally if it is not possible what is the next option for this. It *should* work as long as you don't start the PostgreSQL server on the second machine until you are sure that the first has failed. It will then think it is recovering from having crashed. This is assuming that drbd doesn't mess with how fsync() works, and I seem to remember it doesn't last time I looked. The alternative (assuming you're using the database for user details/routing etc) would be replication (slony/dbmirror/etc) or perhaps even PITR depending on the rate of change of the database. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Postgresql Replication
Richard Huxton dev@archonet.com wrote: Nareen S wrote: Current Setup I am having a Mailserver running on Postfix.For the same I had configured a High Availability Solution using Heartbeat and DRBD for replication.Thesetup is like Primary/Secondary Node Servers and all mails are replicated to secondary using DRBD.If primary server fails secondary will be up with mails .. The total number of users is 600 users,and Hardware used is Two Compaq servers with SCSI HDD,1 GB RAM each, Network-100Mbps New Planned Setup Now I am planning for Postgresql server in the same setup.I want to know will it work in the same setup.ie if I configure a new device in drbd can I get HA for postgresql also.If I use DRBD for postgresql also ,will it affects the existing setups.Finally if it is not possible what is the next option for this. It *should* work as long as you don't start the PostgreSQL server on the second machine until you are sure that the first has failed. It will then think it is recovering from having crashed. This is simply not possible. DRBD allows access to the replicated device only on the primary node and rejects any device access on the secondary node. So the second machine doesn't even have access to the filesystem with the PostgreSQL data files. Unless, of course, one accesses the physical devices below the DRBD layer directly - but when you cheat on your replication system you're bound to get what you deserve ;-) Regards, Alex. -- Opportunity is missed by most people because it is dressed in overalls and looks like work. -- Thomas A. Edison ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[GENERAL] Postgresql Replication
Current Setup I am having a Mailserver running on Postfix.For the same I had configured a High Availability Solution using Heartbeat and DRBD for replication.The setup is like Primary/Secondary Node Servers and all mails are replicated to secondary using DRBD.If primary server fails secondary will be up with mails .. The total number of users is 600 users,and Hardware used is Two Compaq servers with SCSI HDD,1 GB RAM each, Network-100Mbps New Planned Setup Now I am planning for Postgresql server in the same setup.I want to know will it work in the same setup.ie if I configure a new device in drbd can I get HA for postgresql also.If I use DRBD for postgresql also ,will it affects the existing setups.Finally if it is not possible what is the next option for this. Thanks for any Help Regards John
Re: [GENERAL] Postgresql replication
On Thu, Aug 25, 2005 at 01:44:15PM +0200, Bohdan Linda wrote: there are some other db solutions which have good performance when doing this kind of replication across the world. Bluntly, No. -- Andrew Sullivan | [EMAIL PROTECTED] Information security isn't a technological problem. It's an economics problem. --Bruce Schneier ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Postgresql replication
On Wed, Aug 24, 2005 at 11:54:42AM -0400, Chris Browne wrote: There is a Slony-II project ongoing that is trying to construct a more-or-less synchronous multimaster replication system (where part of the cleverness involves trying to get as much taking place in an asynchronous fashion as possible) that would almost certainly be of no use to your use case. Just to emphasise this point: assuming we ever get Slony-II to work, it is all but guaranteed to be useless for cases like the one that started this thread: it'll simply require very fast network connections to work. I've had more than one person ask me when multi-site multimaster is coming, and my answer is always, Have you started work on it yet? I think there might be a way to hack up Slony-I to do it -- Josh Berkus gave me a quick outline while at OSCON that made me think it possible -- but AFAIK, nobody is actually doing that work. It's worth noting that single-origin master-slave async replication is tricky, but by no means impossible. Multi-master _anything_ is hard, no question about it; and it more or less always imposes some overhead that you won't like. The question is merely whether you want to pay that price. A -- Andrew Sullivan | [EMAIL PROTECTED] A certain description of men are for getting out of debt, yet are against all taxes for raising money to pay it off. --Alexander Hamilton ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Postgresql replication
Chris Travers wrote: 1) Efficiency of network throughput 2) Tolerance to attempts at repeat transactions before replication (emptying an account multiple times) 3) Availability of a transaction. We ended up having to give up #1. It's possible to have our transactions routed to multiple servers before it becomes a final transaction. User1 might request a payment on ServerA. User2 then marks the payment as approved on ServerB. ServerC is authoritative and checks the bank/budget balances before posting as final. After each of these steps requires replication of the latest changes to all other servers. (In theory, the first 2 steps only require replication to the authoritative server but we do so anyways so all servers can act as backups for each other -- pending transactions still need to be restored in case of total DB failure.) There's definitely a delay in terms of getting from point A to point Z; duplicate servers. But there's guaranteed financial integrity, users can connect to any server the load balancer picks and no server requires any other server/connection to be up for individual user tranactions to occur. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Postgresql replication
William Yu wrote: Chris Travers wrote: 1) Efficiency of network throughput 2) Tolerance to attempts at repeat transactions before replication (emptying an account multiple times) 3) Availability of a transaction. We ended up having to give up #1. It's possible to have our transactions routed to multiple servers before it becomes a final transaction. User1 might request a payment on ServerA. User2 then marks the payment as approved on ServerB. ServerC is authoritative and checks the bank/budget balances before posting as final. After each of these steps requires replication of the latest changes to all other servers. (In theory, the first 2 steps only require replication to the authoritative server but we do so anyways so all servers can act as backups for each other -- pending transactions still need to be restored in case of total DB failure.) Ok. I see where you are going with this. It is an interesting problem. Multimaster Async Replication will give you a problem in that it allows the attack you are describing due to the asynchronous nature of the replication. If I were trying to prevent this sort of attack, I would try to build into this some sort of account authority which can manage these transactions. Origionally I was thinking of the home server as the obvious place to start if it is available. But if it is not, then you would need some infrastructure to track attempted withdrawals and handle them appropriately. Such servers could cache requests and if they see duplicates or many requests coming from many servers on the same account could flag that. One option might be to have a rotational authority (i.e. home-server, then next, then next) in a ring so that an unavailable server still allows reasonable precautions to be held against emptying the account many times. Basically, if the servers aren't talking to eachother at the time of the transaction, then they are not going to know about duplicates. You have to have some setup beyond your replication to handle this. Otherwise you will have some issues with delays causing the security risks you deem unacceptable. Your question seems to be How do I set up multimaster async replication such that a person cannot empty his account on each server and the answer is that this is an inherent limitation of multimaster async replication. This also means that you will have to have some sort of other verification process for such transactions beyond what is locally available on the replicants. There's definitely a delay in terms of getting from point A to point Z; duplicate servers. But there's guaranteed financial integrity, users can connect to any server the load balancer picks and no server requires any other server/connection to be up for individual user tranactions to occur. The delay will by definition defeat any guarantee of financial integrity if you are allowing read-write operations to the replica without checking with some sort of central authority. At very least, the central authority should look for suspicious patterns. Again, it may be possible to do some failover here, but I don't think you can do without *some* sort of centralized control. (Note, here load balancing is handled by the distribution of accounts. A down server simply means that the next server in the ring will take over its remote verification role). This doesn;t make the security issue go away, but it may reduce it to an acceptable level. I.e. it is still possible for duplicates to be submitted just before and after a home server goes down, but this is a lot better than being able to have one transaction repeated on each server and then dealing with the massively overdrawn account. Best Wishes, Chris Travers Metatron Technology Consulting ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Postgresql replication
Our own personal IM :) Chris Travers wrote: The delay will by definition defeat any guarantee of financial integrity if you are allowing read-write operations to the replica without checking with some sort of central authority. At very least, the central authority should look for suspicious patterns. Again, it may be possible to do some failover here, but I don't think you can do without *some* sort of centralized control. Actually this is the easy part. When the home server finally issues payments, it only issues what it knows about and what can be verified as OK. Any transactions that are currently being entered on another server will appear after the next replication cycle and it will be verified afterwards. If the previous payment issuing cycle used up all the money, the new requests are kept in pending until money is put in. This does allow for newer requests that happen to be executed on home servers to possibly take precendence over old requests but there is no requirement in the business process that payments must come out in any specific order. This doesn;t make the security issue go away, but it may reduce it to an acceptable level. I.e. it is still possible for duplicates to be submitted just before and after a home server goes down, but this is a lot better than being able to have one transaction repeated on each server and then dealing with the massively overdrawn account. The home server going down is the trickiest issue. Because when a server disappears, is that because it went down temporarily? For good? A temporary internet problem where nobody can get access to it? Or an internet routing issue where just the connection between those two servers is severed? If it's the last, users might still be doing stuff on ServerA with ServerA is posting financials but ServerB thinks the server is down and decides to take over ServerA's duties. Of course, in ServerA's view, it's ServerB and ServerC that's down -- not itself. Maybe we can mitigate this by having more servers at more data centers around the world so everybody can monitor everybody. At some point, if you have N servers and N-1 servers say ServerA is down, it probably is down. With a high enough N, ServerA could probably decisively decide it was the server severed from the internet and refuse to post any financials until connection to the outside world was restore + some extra threshold. This last problem, which luckily occurs rarely, we do by hand right now. We're not ready to run this on full auto because we only have 2 data centers (with multiple servers within each data center). The servers do not have enough info to know which server is actually down in order to auto-promote/demote. It does require staff that's not just in 1 location though because our primary office going down w/ our local datacenter would mean nobody there could do the switchover. (Assuming major natural disaster that kept us from using our laptops at the local Starbucks to do the work.) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Postgresql replication
William Yu [EMAIL PROTECTED] writes: Chris Browne wrote: I'm pretty sure that they _don't_ track balance updates for each transaction that applies to a customer's account. You could, via one form of trickery or another, overdraw your account by a fairly hefty amount, and they probably won't notice for a day or even three. But once they notice/estimate that the Gentle Caller has built up some dangerously high balance, they'll warn of impending discontinuation of service if some sum isn't paid by some date. This works for companies that have some degree of power over their customers. E.g. pay up or we disconnect your service. Return your overdrafts/pay your fees or we mess up your credit. This doesn't work if it's a small company who's account has been emptied. Sure the bank will refuse to honor the check but then that company will be hit with overdraw penalties and possible legal penalties to the payee for the bounced check. The warning threshold system is easy to implement but there will always be corner cases where the warning is not soon enough or a single payment wipes out ~ 100% of the account. Warn too often and it'll be ignored by people as a boy crying wolf alarm. In a context where there is no degree of power over their customers, I would hardly think that the presence/lack of automated controls or presence/lack of balance synchronization is particularly material. In other words, if trying to apply policies is going to forcibly break, then building the data system may have been a futile exercise in the first place. And trying to distribute the system may again be a *business* mistake that admits no technical solution. If that's the case, then we can conclude that replication is no solution, and that the organization should improve connectivity for their single centralized system. That may be the case, but is the uninteresting case, as it amounts to throwing our hands up, saying no answers; you can't have a distributed system, and giving up on trying anything further. No point to further discussion... -- let name=cbbrowne and tld=acm.org in String.concat @ [name;tld];; http://cbbrowne.com/info/linux.html There is no reason anyone would want a computer in their home. -- Ken Olson, Pres. and founder of Digital Equipment Corp. 1977 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Postgresql replication
Chris Travers wrote: Why not have the people who have rights to review this all write to the master database and have that replicated back? It seems like latency is not really an issue. Replication here is only going to complicate What master database? Having a single master defeats the purpose of load balancing to handle more users. things. If it were me, I would be having my approval app pull data from *all* of the databases independently and not rely on the replication for this part. The replication could then be used to replicate *approved* data back to the slaves. If your app client happens to have high speed access to all servers, fine. And you can guarantee uptime connections to all servers except for the rare cases of hardware failure. The problem is if you don't, you end up with every transaction running at the speed of the slowest connection between a client and the farthest DB. While the final status of a transaction does not need to show up anytime soon on a user's screen, there still needs to be fast response for each individual user action. How bad does the response get? I've done some simple tests comparing APP -LAN- DB versus APP -cross country VPN- DB. Even simple actions like inserting a recording and checking for a dupe key violation (e.g. almost no bandwidth needed) takes about 10,000 times longer than over a 100mbit LAN. I still don't understand the purpose of replicating the pending data... Imagine a checking account. A request to make an online payment can be made on any server. The moment the user submits a request, he sees it on his screen. This request is considered pending and not a true transaction yet. All requests are collected together via replication and the home server for that account then can check the account balance and decide whether there's enough funds to issue those payments. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Postgresql replication
William Yu wrote: This system sounds ok for documents and general data that can always be revived via version control/history. But I can't see how this would work for financial transactions where you're dealing with money and bank accounts. Suppose I have $100 in my account. I decided to login to multiple servers and wire transfer $100 to another account on every server. And I hit submit exactly at the same time for every server so check. Sure they can resolve the conflict afterwards in terms of saying in terms of which transfer to kill off. But the fact is that my other account has that N x $100 already and I've just fleeced the bank. Hmmm... I think you should realize by now that no multimaster async replication solution is going to adequately work for something like these financial transactions. You need either synchronous or single-master simply because you need to have a concept that there is a *single* authoritative source of the current status and balance on the account. You can't really expect async-replicated servers to know about transactions that haven't been replicated yet can you? In other words this problem is inherent to multimaster async replication. There is no way around it. If there was it would either not be async or not be multimaster :-) I have built my own async multimaster replication systems for some applications. One of them is actually a system where you have two and only two masters which replicate back and forth. The trick I used here was for each master to use a different SERIAL field as its primary key so there are no duplicates. Another option is to have a compound primary key which is a serial and a server-hash (this would scale to a larger number of masters). Best Wishes, Chris Travers Metatron Technology Consulting begin:vcard fn:Chris Travers n:Travers;Chris email;internet:[EMAIL PROTECTED] x-mozilla-html:FALSE version:2.1 end:vcard ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Postgresql replication
Hi, I´mthinking to test your suggestion, basically because there are only few sites to connect, but there are some points that aren´t very clear to me. My doubts: 1. How to make a view updatable? Using the rule system? 1. Why are insertshandled differently from updates? 2. Can not I use the rule system to make updatesgo to the base table too (the master piece) like the inserts? 3. Is there not some peformance issues related with the fact of using a table that is an union instead a normal table? Thanks in advance! Reimer Jeff Davis [EMAIL PROTECTED] escreveu: Chris Browne wrote: Slony-I is a master/slave asynchronous replication system; if you already considered it unsuitable, then I see little likelihood of other systems with the same sorts of properties being suitable. What could conceivably be of use to you would be a *multimaster* asynchronous replication system. Unfortunately, multimaster *anything* is a really tough nut to crack. In general that's a difficult problem, but in practice there may be asolution.For instance, perhaps the following configuration would be helpful:Make a database for each physical server, called db1 ... dbN. Let yourlogical tables in each database be table1 ... tableM. Now, for eachlogical tableX (where 1 = X = M), make N physical tables, tableX_1 ...tableX_N. Now, make a view call ed tableX that is the UNION of tableX_1... tableX_N (tableX is not a real table, it's just a logical table).Now, use Slony-I. For each dbY (where 1 = Y = N), make dbY a masterfor tableX_Y (for all X where 1 = X = M) and a slave for tableX_Z (forall X,Z where 1 = X = M, Z != Y).Now, use a rule that replaces all INSERTs to tableX (where 1 = X = M)on dbY (where 1 = Y = N) with INSERTs to tableX_Y.That was my attempt at being unambiguous. In general what I mean is thateach database is master of one piece of a table, and slave to all theother pieces of that table, and then you have a view which is the unionof those pieces. That view is the logical table. Then have a RULE whichmakes INSERTs go to the physical table for which that database is master.The advantages: if one machine goes down, the rest keep going, andmerely miss the updates from that one site to that table. If one machinemakes an insert to the table, it quickly propogates to the othermachines and transparently becomes a part of the logical tables on thosemachines.The disadvantages: UPDATEs are difficult, and might end up with acomplicated set of rules/procedures/triggers. You may have to programthe application defensively in case the database is unable to update aremote database for various reasons (if the record to be updated is apart of a table for which another database is master). Also, since thesolution is asynchronous, the databases may provide different results tothe same query.In general, this solution does not account for all kinds of dataconstraints. The conflict resolution is very simplified because it'sbasically just the union of data. If that union could cause a constraintviolation itself, this solution might not be right for you. Forinstance, let's say you're tracking video rentals, and store policy says that you only rent one video per person. However, maybe they go to store1 and rent a video, and run to store 2 and rent a video before store 1sends the INSERT record over to store 2. Now, when they finally doattempt to UNION the data for the view, you have an inconsistant state.Many applications can get by just fine by UNIONing the data like that,and if not, perhaps work around it.I hope this is helpful. Let me know if there's some reason my plan won'twork.Regards,Jeff Davis---(end of broadcast)---TIP 9: In versions below 8.0, the planner will ignore your desire tochoose an index scan if your joining column's datatypes do notmatch__Converse com seus amigos em tempo real com o Yahoo! Messenger http://br.download.yahoo.com/messenger/
Re: [GENERAL] Postgresql replication
[EMAIL PROTECTED] (Chris Travers) writes: William Yu wrote: This system sounds ok for documents and general data that can always be revived via version control/history. But I can't see how this would work for financial transactions where you're dealing with money and bank accounts. Suppose I have $100 in my account. I decided to login to multiple servers and wire transfer $100 to another account on every server. And I hit submit exactly at the same time for every server so check. Sure they can resolve the conflict afterwards in terms of saying in terms of which transfer to kill off. But the fact is that my other account has that N x $100 already and I've just fleeced the bank. Hmmm... I think you should realize by now that no multimaster async replication solution is going to adequately work for something like these financial transactions. You need either synchronous or single-master simply because you need to have a concept that there is a *single* authoritative source of the current status and balance on the account. You can't really expect async-replicated servers to know about transactions that haven't been replicated yet can you? Actually, I disagree. There _is_ a way to cope with that issue. The methodology involves not having a Balance Column anywhere in the database, where activity *always* involves recording the deltas. If the DEBIT and the CREDIT for each financial transaction are always committed, in any given place, as part of a single transaction, then there is no reason for the system to fall out of balance. This doesn't prevent the scenario of someone logging into many servers simultaneously and emptying their account multiple times. But there are other ways of addressing that problem. This sort of scenario is almost certainly the case for the stupendously enormous quantities of billing transactions that result from long distance and cellular activity in the telco industry. I'm pretty sure that they _don't_ track balance updates for each transaction that applies to a customer's account. You could, via one form of trickery or another, overdraw your account by a fairly hefty amount, and they probably won't notice for a day or even three. But once they notice/estimate that the Gentle Caller has built up some dangerously high balance, they'll warn of impending discontinuation of service if some sum isn't paid by some date. This sort of analysis does not require that any location is considered authoritative for the balance of the account. It suffices to have some sort of transaction cutoff, that the analysis is sure to include all transactions up to *yesterday* at midnight UTC. Some bad things might take place during the up-to-24h lag; the assumption is that that won't be material, or that you can do other sorts of traffic analysis to warn of impending problem accounts... -- output = (cbbrowne @ ntlug.org) http://www.ntlug.org/~cbbrowne/sap.html ... The book [CLtL1] is about 400 pages of 8.5 by 11 Dover output. Apparently the publisher and typesetter decided that this made the lines too wide for easy reading, so they will use a 6 by 9 format. This will make the shape of the book approximately cubical. Now, there are 26 chapters counting the index, and a Rubik's cube has 26 exterior cubies. I'll let you individually extrapolate and fantasize from there. -- GLS ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Postgresql replication
William Yu wrote: Chris Travers wrote: Why not have the people who have rights to review this all write to the master database and have that replicated back? It seems like latency is not really an issue. Replication here is only going to complicate What master database? Having a single master defeats the purpose of load balancing to handle more users. I guess I am thinking along different lines than you. I was thinking that the simplest solution would be to have master/slave replication for *approved* transactions only and no replication for initial commits prior to approval. This makes the assumption that a single transaction will be committed on a single server, and that a single transaction will not be split over multiple servers. In this way, you can commit a pending transaction to any single server, and when it is approved, it gets replicated via the master. See below for more. things. If it were me, I would be having my approval app pull data from *all* of the databases independently and not rely on the replication for this part. The replication could then be used to replicate *approved* data back to the slaves. If your app client happens to have high speed access to all servers, fine. And you can guarantee uptime connections to all servers except for the rare cases of hardware failure. The problem is if you don't, you end up with every transaction running at the speed of the slowest connection between a client and the farthest DB. While the final status of a transaction does not need to show up anytime soon on a user's screen, there still needs to be fast response for each individual user action. Well... It depends on how it is implimented I guess. If you pull transactional information in the background while the user is doing other things, then it shouldn't matter. Besides, what should actually happen is that your connection is only as slow as the connection to the server which hosts the pending transaction you are trying to commit at the moment. In this way, each request only goes to one server (the one which has the connection). You could probably use DBI-Link and some clever materialized views to maintain the metadata at each location without replicating the whole transaction. You could probably even use DBI-Link or dblink to pull the transactions in a transparent way. Or you could replicate transactions into a pending queue dynamically... There are all sorts of ways you could make this respond well over slow connections. Remember, PostgreSQL allows you to separate storage from presentation of the data, and this is quite powerful. How bad does the response get? I've done some simple tests comparing APP -LAN- DB versus APP -cross country VPN- DB. Even simple actions like inserting a recording and checking for a dupe key violation (e.g. almost no bandwidth needed) takes about 10,000 times longer than over a 100mbit LAN. I think you could design a database such that duplicate keys are not an issue and only get checked on the master and then should never be a problem. Thinking about it It seems here that one ends up with a sort of weird multi-master replication based on master/slave replication if you replicate these changes in the background (via another process, Notify, etc). I still don't understand the purpose of replicating the pending data... Imagine a checking account. A request to make an online payment can be made on any server. The moment the user submits a request, he sees it on his screen. This request is considered pending and not a true transaction yet. All requests are collected together via replication and the home server for that account then can check the account balance and decide whether there's enough funds to issue those payments. Thinking about this The big issue is that you only want to replicate the deltas, not the entire account. I am still thinking master/slave, but something where the deltas are replicated in the background or where the user, in checking his account, is actually querying the home server. This second issue could be done via dblink or DBI-Link and would simply require that a master table linking the accounts with home servers be replicated (this should, I think, be fairly low-overhead). Best Wishes, Chris Travers Metatron Technology Consulting ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Postgresql replication
On Thu, 2005-08-25 at 17:45 -0700, Jeff Davis wrote: The replicator surely is not optional, and must be centralized. From http://pgcluster.projects.postgresql.org/1_3/index.html: Several replication server can be set up. When an problem occurs at the replication server, Cluster DB automatically changes connection to the standing-by replication server. When all replication servers stop, Cluster DB operates in stand-alone mode. There are two modes for stand-alone mode: read only mode which only permits references, and read write mode, which permits updating. PGCluster is query-based, right? Yes. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Postgresql replication
Chris Travers wrote: I guess I am thinking along different lines than you. I was thinking that the simplest solution would be to have master/slave replication for *approved* transactions only and no replication for initial commits prior to approval. This makes the assumption that a single transaction will be committed on a single server, and that a single transaction will not be split over multiple servers. In this way, you can commit a pending transaction to any single server, and when it is approved, it gets replicated via the master. See below for more. This works if you don't care that multiple servers commit transactions that force a budget or bank account to be exceeded. Thinking about this The big issue is that you only want to replicate the deltas, not the entire account. I am still thinking master/slave, but something where the deltas are replicated in the background or where the user, in checking his account, is actually querying the home server. This second issue could be done via dblink or DBI-Link and would simply require that a master table linking the accounts with home servers be replicated (this should, I think, be fairly low-overhead). Except what you know have is your system fails if any server fail or is inaccessible. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Postgresql replication
William Yu wrote: Chris Travers wrote: I guess I am thinking along different lines than you. I was thinking that the simplest solution would be to have master/slave replication for *approved* transactions only and no replication for initial commits prior to approval. This makes the assumption that a single transaction will be committed on a single server, and that a single transaction will not be split over multiple servers. In this way, you can commit a pending transaction to any single server, and when it is approved, it gets replicated via the master. See below for more. This works if you don't care that multiple servers commit transactions that force a budget or bank account to be exceeded. Ok. then lets look at possibility B. (Async Multimaster Replication is out). Thinking about this The big issue is that you only want to replicate the deltas, not the entire account. I am still thinking master/slave, but something where the deltas are replicated in the background or where the user, in checking his account, is actually querying the home server. This second issue could be done via dblink or DBI-Link and would simply require that a master table linking the accounts with home servers be replicated (this should, I think, be fairly low-overhead). Except what you know have is your system fails if any server fail or is inaccessible. Ok. If you have a system where each location is authoritative for its customers and the server transparently queries that server where needed (via a view). then when any server becomes inaccessible then the customers whose accounts are on that server become inaccessible. This may not be accessible. But if this is the case, then you could treat this as a partitioned table, where each partition is authoritative on one location (see a previous post on how to do this) and then use Slony to replicate. Again this does not get around your objection above namely that it is possible to do duplicate transactions at multiple locations. For this you would need an explicit call to the authoritative server. I see no other way around that. This might allow people to view tentative balances from other branches if the home (or its connection) server is down, but they would not be able to withdraw funds. But if you do this, you have a different problem. Namely that replication will be occuring over your slower than desired links. As the number of branches grow, so will the bandwidth demands on every branch. This may not therefore be scalable. Unfortunately there is no free lunch here. And I think that at some point you are going to have to choose between: 1) Efficiency of network throughput 2) Tolerance to attempts at repeat transactions before replication (emptying an account multiple times) 3) Availability of a transaction. You can pick any two. I think that this is the way it will work with any other database system as well. Best Wishes, Chris Travers Metatron Technology Consulting ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Postgresql replication
Carlos Henrique Reimer wrote: I read some documents about replication and realized that if you plan on using asynchronous replication, your application should be designed from the outset with that in mind because asynchronous replication is not something that can be easily “added on” after the fact. Am I right? Depending on your needs, you may find pgpool and Slony to be a workable combination. This is better when you have a lot of reads and only occasional writes. This way writes get redirected back to the master, and read-only transactions get run on the slaves. Best Wishes, Chris Travers Metatron Technology Consulting ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Postgresql replication
It provides pseudo relief if all your servers are in the same building. Having a front-end pgpool connector pointing to servers across the world is not workable -- performance ends up being completely decrepit due to the high latency. Which is the problem we face. Great, you've got multiple servers for failover. Too bad it doesn't do much good if your building gets hit by fire/earthquake/hurricane/etc. Aly Dharshi wrote: I know I am wadding into this discussion as an beginner compared to the rest who have answered this thread, but doesn't something like pgpool provide relief for pseudo-multimaster replication, and what about software like sqlrelay wouldn't these suites help to some extent ? Looking forward to be enlightened. Cheers, Aly. William Yu wrote: Carlos Henrique Reimer wrote: I read some documents about replication and realized that if you plan on using asynchronous replication, your application should be designed from the outset with that in mind because asynchronous replication is not something that can be easily “added on” after the fact. Yes, it requires a lot foresight to do multi-master replication -- especially across high latency connections. I do that now for 2 different projects. We have servers across the country replicating data every X minutes with custom app logic resolves conflicting data. Allocation of unique IDs that don't collide across servers is a must. For 1 project, instead of using numeric IDs, we using CHAR and pre-append a unique server code so record #1 on server A is A01 versus ?x01 on other servers. For the other project, we were too far along in development to change all our numerics into chars so we wrote custom sequence logic to divide our 10billion ID space into 1-Xbillion for server 1, X-Ybillion for server 2, etc. With this step taken, we then had to isolate (1) transactions could run on any server w/o issue (where we always take the newest record), (2) transactions required an amalgam of all actions and (3) transactions had to be limited to home servers. Record keeping stuff where we keep a running history of all changes fell into the first category. It would have been no different than 2 users on the same server updating the same object at different times during the day. Updating of summary data fell into category #2 and required parsing change history of individual elements. Category #3 would be financial transactions requiring strict locks were be divided up by client/user space and restricted to the user's home server. This case would not allow auto-failover. Instead, it would require some prolonged threshold of downtime for a server before full financials are allowed on backup servers. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Postgresql replication
I would have a slight offtopic question, this is issue only of pgsql or there are some other db solutions which have good performance when doing this kind of replication across the world. Regards, Bohdan On Thu, Aug 25, 2005 at 09:01:49AM +0200, William Yu wrote: It provides pseudo relief if all your servers are in the same building. Having a front-end pgpool connector pointing to servers across the world is not workable -- performance ends up being completely decrepit due to the high latency. Which is the problem we face. Great, you've got multiple servers for failover. Too bad it doesn't do much good if your building gets hit by fire/earthquake/hurricane/etc. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Postgresql replication
Bohdan Linda schrieb: I would have a slight offtopic question, this is issue only of pgsql or there are some other db solutions which have good performance when doing this kind of replication across the world. it depends entirely on your application. There is no one size fits all For example to have an online backup, WAL archiving to remote sites is often sufficient. However you cannot have synchronous multimaster replication over slow lines and high performance with updates the same time. There is always a tradeoff in any (even in high cost commercial solutions) you have to carefully consider. Regards, Bohdan On Thu, Aug 25, 2005 at 09:01:49AM +0200, William Yu wrote: It provides pseudo relief if all your servers are in the same building. Having a front-end pgpool connector pointing to servers across the world is not workable -- performance ends up being completely decrepit due to the high latency. Which is the problem we face. Great, you've got multiple servers for failover. Too bad it doesn't do much good if your building gets hit by fire/earthquake/hurricane/etc. This would remove the application using that data too, or not? ;) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Postgresql replication
As far as I know, nobody has a generic solution for multi-master replication where servers are not in close proximity. Single master replication? Doable. Application specific conflict resolution? Doable. Off the shelf package that somehow knows financial transactions on a server shouldn't be duplicated on another? Uhh...I'd be wary of trying it out myself. Bohdan Linda wrote: I would have a slight offtopic question, this is issue only of pgsql or there are some other db solutions which have good performance when doing this kind of replication across the world. Regards, Bohdan On Thu, Aug 25, 2005 at 09:01:49AM +0200, William Yu wrote: It provides pseudo relief if all your servers are in the same building. Having a front-end pgpool connector pointing to servers across the world is not workable -- performance ends up being completely decrepit due to the high latency. Which is the problem we face. Great, you've got multiple servers for failover. Too bad it doesn't do much good if your building gets hit by fire/earthquake/hurricane/etc. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Postgresql replication
Another tidbit I'd like to add. What has helped a lot in implementing high-latency master-master replication writing our software with a business process model in mind where data is not posted directly to the final tables. Instead, users are generally allowed to enter anything -- could be incorrect, incomplete or the user does not have rights -- the data is still dumped into pending tables for people with rights to fix/review/approve later. Only after that process is the data posted to the final tables. (Good data entered on the first try still gets pended -- validation phase simply assumes the user who entered the data is also the one who fixed/reviewed/approved.) In terms of replication, this model allows for users to enter data on any server. The pending records then get replicated to every server. Each specific server then looks at it's own dataset of pendings to post to final tables. Final data is then replicated back to all the participating servers. There may be a delay for the user if he/she is working on a server that doesn't have rights to post his data. However, the pending-post model gets users used to the idea of (1) entering all data in large swoop and validating/posting it afterwards and (2) data can/will sit in pending for a period of time until it is acted upon with somebody/some server with the proper authority. Hence users aren't expecting results to pop up on the screen the moment they press the submit button. William Yu wrote: Yes, it requires a lot foresight to do multi-master replication -- especially across high latency connections. I do that now for 2 different projects. We have servers across the country replicating data every X minutes with custom app logic resolves conflicting data. Allocation of unique IDs that don't collide across servers is a must. For 1 project, instead of using numeric IDs, we using CHAR and pre-append a unique server code so record #1 on server A is A01 versus ?x01 on other servers. For the other project, we were too far along in development to change all our numerics into chars so we wrote custom sequence logic to divide our 10billion ID space into 1-Xbillion for server 1, X-Ybillion for server 2, etc. With this step taken, we then had to isolate (1) transactions could run on any server w/o issue (where we always take the newest record), (2) transactions required an amalgam of all actions and (3) transactions had to be limited to home servers. Record keeping stuff where we keep a running history of all changes fell into the first category. It would have been no different than 2 users on the same server updating the same object at different times during the day. Updating of summary data fell into category #2 and required parsing change history of individual elements. Category #3 would be financial transactions requiring strict locks were be divided up by client/user space and restricted to the user's home server. This case would not allow auto-failover. Instead, it would require some prolonged threshold of downtime for a server before full financials are allowed on backup servers. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Postgresql replication
Tino Wildenhain wrote: Which is the problem we face. Great, you've got multiple servers for failover. Too bad it doesn't do much good if your building gets hit by fire/earthquake/hurricane/etc. This would remove the application using that data too, or not? ;) Yes and no. If your DB is an internal app for a company where the users are in the same building as the servers, doesn't matter really I guess. Meteor hitting the building would kill the users in addition to the server so nobody will be calling you to complain about system downtime. If your app is used by external customers who are all across the country, they want to continue to still use your software even though you and data center #1 are 6 feet under due to an 8.0 earthquake. They want auto-failover to data center #2 which is in close proximity to CIA headquarters and other juicy terrorist targets. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Postgresql replication
On Thursday 25 August 2005 13:03, William Yu wrote: As far as I know, nobody has a generic solution for multi-master replication where servers are not in close proximity. Single master replication? Doable. Application specific conflict resolution? Doable. Off the shelf package that somehow knows financial transactions on a server shouldn't be duplicated on another? Uhh...I'd be wary of trying it out myself. The most obvious one that does exactly this (generic multi-master replication) is Lotus Domino. It is not a relational DB, but not sufficiently far off to stop the analogy. Domino marks each document with a binary value which identifies the server (built from a hash of the server name and the time the DB was created) and a timestamp when it was last modified, and also each document (record) has an ID (like OIDs). More recent versions also do this at a field level to avoid conflicts and speed replication. When two servers replicate they look for all documents modified since the last replication time, and compare the list. Those only modified on one server are copied across to the other server replacing the old record and carrying the updated on server and timestamp with them. When a document is deleted in Domino it actually does not dissapear, it is reduced to a deletion stub, and this gets replicated as it has the same ID as the original record. Those that have been modified on both sides are copied to the other DB, but both records remain and it is left to the user to resolve conflicts. Field level replication reduces the need for this considerably. Periodically the deletion stubs are purged, once all known replicas have replicated. Domino has absolutely no concept of a master DB. Obviously this scheme would be difficult to do on a pure relational system. But with triggers and a few standard fields it would not be impossible to do for a limited application set. How the user would resolve conflicts would also be application specific I suspect and how one would relate having two version of a record in the DB then they both have a field which is supposed to be unique is also a problem that would have to be resolved (Domino does not have the concept of unique keys). David Bohdan Linda wrote: I would have a slight offtopic question, this is issue only of pgsql or there are some other db solutions which have good performance when doing this kind of replication across the world. Regards, Bohdan On Thu, Aug 25, 2005 at 09:01:49AM +0200, William Yu wrote: It provides pseudo relief if all your servers are in the same building. Having a front-end pgpool connector pointing to servers across the world is not workable -- performance ends up being completely decrepit due to the high latency. Which is the problem we face. Great, you've got multiple servers for failover. Too bad it doesn't do much good if your building gets hit by fire/earthquake/hurricane/etc. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Postgresql replication
Chris Travers wrote: Carlos Henrique Reimer wrote: I read some documents about replication and realized that if you plan on using asynchronous replication, your application should be designed from the outset with that in mind because asynchronous replication is not something that can be easily “added on” after the fact. Am I right? Depending on your needs, you may find pgpool and Slony to be a workable combination. This is better when you have a lot of reads and only occasional writes. This way writes get redirected back to the master, and read-only transactions get run on the slaves. As long as you don't have any functions that write to the db. pgpool could (and likely would) redirect some of these to the subscriber. Slony would prevent the data from being written (which would prevent the subscriber from being corrupted). -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Postgresql replication
Bohdan Linda wrote: I would have a slight offtopic question, this is issue only of pgsql or there are some other db solutions which have good performance when doing this kind of replication across the world. It all depends on the quality of the connection Node A to Node B. If connectivity is poor, then it is impossible to have good performance doing anything across that connection. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Postgresql replication
William Yu schrieb: Tino Wildenhain wrote: Which is the problem we face. Great, you've got multiple servers for failover. Too bad it doesn't do much good if your building gets hit by fire/earthquake/hurricane/etc. This would remove the application using that data too, or not? ;) Yes and no. If your DB is an internal app for a company where the users are in the same building as the servers, doesn't matter really I guess. Meteor hitting the building would kill the users in addition to the server so nobody will be calling you to complain about system downtime. If your app is used by external customers who are all across the country, they want to continue to still use your software even though you and data center #1 are 6 feet under due to an 8.0 earthquake. They want auto-failover to data center #2 which is in close proximity to CIA headquarters and other juicy terrorist targets. Sure, but in this case a simple async master-slave (slony-1) and the usual failover (also DNS-failover) should be sufficient. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Postgresql replication
[EMAIL PROTECTED] (Aly Dharshi) writes: I know I am wadding into this discussion as an beginner compared to the rest who have answered this thread, but doesn't something like pgpool provide relief for pseudo-multimaster replication, and what about software like sqlrelay wouldn't these suites help to some extent ? Looking forward to be enlightened. pgpool and sqlrelay provide you a loose equivalent to synchronous replication, as they have to submit the queries to *all* of the nodes. If you can live with the long latency times that result if the nodes are widely separated, that's great. Unfortunately, the reason for people to want *asynchronous* multimaster replication is that they /cannot/ afford that latency time. They want to submit updates only to the local database, and have the updates head to the other server some time later. That's why pgpool/sqlrelay aren't an answer. -- output = (cbbrowne @ acm.org) http://cbbrowne.com/info/languages.html Rules of the Evil Overlord #46. If an advisor says to me My liege, he is but one man. What can one man possibly do?, I will reply This. and kill the advisor. http://www.eviloverlord.com/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Postgresql replication
[EMAIL PROTECTED] (David Goodenough) writes: On Thursday 25 August 2005 13:03, William Yu wrote: As far as I know, nobody has a generic solution for multi-master replication where servers are not in close proximity. Single master replication? Doable. Application specific conflict resolution? Doable. Off the shelf package that somehow knows financial transactions on a server shouldn't be duplicated on another? Uhh...I'd be wary of trying it out myself. The most obvious one that does exactly this (generic multi-master replication) is Lotus Domino. It is not a relational DB, but not sufficiently far off to stop the analogy. There's a better known case, and that's PalmOS PalmSync. It does fairly much the same thing as Lotus Notes(tm). In both cases, they are doing something that works reasonably well for data that looks somewhat like documents. Conflict resolution at the document level is something that users can handle pretty well by hand. Unfortunately, for data that's more at the hordes of little facts level, conflict resolution is a tough problem :-(. -- let name=cbbrowne and tld=cbbrowne.com in String.concat @ [name;tld];; http://www.ntlug.org/~cbbrowne/internet.html Rules of the Evil Overlord #223. I will install a fire extinguisher in every room -- three, if the room contains vital equipment or volatile chemicals. http://www.eviloverlord.com/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Postgresql replication
[EMAIL PROTECTED] (Bohdan Linda) writes: I would have a slight offtopic question, this is issue only of pgsql or there are some other db solutions which have good performance when doing this kind of replication across the world. Asynchronous multimaster replication is pretty much a generally tough problem. Oracle, Sybase, and DB2 all have methods of doing it; none are either simple, straightforward, or transparent to use. It's a tough problem, in general. -- (reverse (concatenate 'string gro.mca @ enworbbc)) http://cbbrowne.com/info/rdbms.html I think we might have been better off with a slide rule. -- Zaphod Beeblebrox ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Postgresql replication
Jim C. Nasby wrote: Or, for something far easier, try http://pgfoundry.org/projects/pgcluster/ which provides syncronous multi-master clustering. He specifically said that pgcluster did not work for him because the databases would be at physically seperate locations. PGCluster requires that there be a load balancer and a replicator centrally located managing the cluster. If a network problem happens at the centralized location, it would bring down all locations completely. I think he's looking for an async solution because of that. In my solution, if one location goes down, the others keep going. Regards, Jeff Davis ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Postgresql replication
ExactlyJeff Davis [EMAIL PROTECTED] escreveu: Jim C. Nasby wrote: Or, for something far easier, try http://pgfoundry.org/projects/pgcluster/ which provides syncronous multi-master clustering. He specifically said that pgcluster did not work for him because thedatabases would be at physically seperate locations. PGCluster requiresthat there be a load balancer and a replicator centrally locatedmanaging the cluster. If a network problem happens at the centralizedlocation, it would bring down all locations completely.I think he's looking for an async solution because of that. In mysolution, if one location goes down, the others keep going.Regards,Jeff Davis Yahoo! Acesso Grátis: Internet rápida e grátis. Instale o discador agora!
Re: [GENERAL] Postgresql replication
http://pgfoundry.org/projects/pgcluster/ which provides syncronous multi-master clustering. He specifically said that pgcluster did not work for him because ...PGCluster requires that there be a load balancer and a replicator centrally located managing the cluster. If a network problem happens at the centralized location, it would bring down all locations completely. I think the load balancer is an optional component. Clients can connect either to a load balancer, or directly to a specific machine. Also, I think pgcluster can operate async. If the system runs in normal mode then the client gets a response as soon as the transaction is committed on the local machine. In reliable mode the client waits for the commit to happen on all machines. See http://pgcluster.projects.postgresql.org/structure_of_replication.html ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Postgresql replication
On Thu, Aug 25, 2005 at 12:20:52PM -0400, Chris Browne wrote: [EMAIL PROTECTED] (Bohdan Linda) writes: I would have a slight offtopic question, this is issue only of pgsql or there are some other db solutions which have good performance when doing this kind of replication across the world. Asynchronous multimaster replication is pretty much a generally tough problem. Oracle, Sybase, and DB2 all have methods of doing it; none are either simple, straightforward, or transparent to use. It's a tough problem, in general. I've been involved in one project that did multi-master async replication across geographically separated servers. It worked but was operationally so horrible that I've sworn never to architect a system that requires multi-master replication at the database ever again. Application level? Possibly. But not at the DB level. Cheers, Steve ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Postgresql replication
David Goodenough wrote: The most obvious one that does exactly this (generic multi-master replication) is Lotus Domino. It is not a relational DB, but not sufficiently far off to stop the analogy. Domino marks each document with a binary value which identifies the server (built from a hash of the server name and the time the DB was created) and a timestamp when it was last modified, and also each document (record) has an ID (like OIDs). More recent versions also do this at a field level to avoid conflicts and speed replication. When two servers replicate This system sounds ok for documents and general data that can always be revived via version control/history. But I can't see how this would work for financial transactions where you're dealing with money and bank accounts. Suppose I have $100 in my account. I decided to login to multiple servers and wire transfer $100 to another account on every server. And I hit submit exactly at the same time for every server so check. Sure they can resolve the conflict afterwards in terms of saying in terms of which transfer to kill off. But the fact is that my other account has that N x $100 already and I've just fleeced the bank. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Postgresql replication
Carlos Henrique Reimer wrote: Exactly Was there something lacking in my suggested solution at: http://archives.postgresql.org/pgsql-general/2005-08/msg01240.php It's a little complicated to administer, but it seems well-suited to a company that has several locations that want to share data without being too dependent upon eachother. Regards, Jeff Davis ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Postgresql replication
[EMAIL PROTECTED] (Brad Nicholson) writes: Bohdan Linda wrote: I would have a slight offtopic question, this is issue only of pgsql or there are some other db solutions which have good performance when doing this kind of replication across the world. It all depends on the quality of the connection Node A to Node B. If connectivity is poor, then it is impossible to have good performance doing anything across that connection. The nifty magic of asynchronous multimaster would be that you'd be able to have fast access at Site Alpha to Node Alpha as well as fast access at Site Beta to Node Beta. That would, indeed, be valuable. That's why it's one of those Holy Grail things... It's Rather Hard, which is why there isn't a Slony-III: The Wrath of Async MultiMaster just yet. It would be valuable for someone to look into a replication system to support that sort of scenario. With the stipulation that there be some big, fiery dragons there ready to eat you and your data :-). -- (format nil [EMAIL PROTECTED] cbbrowne ntlug.org) http://www3.sympatico.ca/cbbrowne/linuxdistributions.html I'm guilty of a lot of things, but I didn't ever do that.'' -- Bill Clinton, on why he had astroturf lining the back of his pickup truck [In fact, he _DID_ do this, thus making life creepier than fiction...] ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [GENERAL] Postgresql replication
[EMAIL PROTECTED] (Steve Atkins) writes: On Thu, Aug 25, 2005 at 12:20:52PM -0400, Chris Browne wrote: [EMAIL PROTECTED] (Bohdan Linda) writes: I would have a slight offtopic question, this is issue only of pgsql or there are some other db solutions which have good performance when doing this kind of replication across the world. Asynchronous multimaster replication is pretty much a generally tough problem. Oracle, Sybase, and DB2 all have methods of doing it; none are either simple, straightforward, or transparent to use. It's a tough problem, in general. I've been involved in one project that did multi-master async replication across geographically separated servers. It worked but was operationally so horrible that I've sworn never to architect a system that requires multi-master replication at the database ever again. Application level? Possibly. But not at the DB level. I have heard something of the same argument being thrown at attempts to use things like CORBA to allow you to pretend that there is no difference between local and remote access to things. There are some good arguments to be made, there. You really need to access remote data in different ways than local data because the latency *will* kill you if you ignore it. It's a well and neat idea to try to find ways to hide those differences under the carpet; it is quite likely that the better way will involve addressing that somewhere in the application, as opposed to trying to get the DB (or the ORB or other 'distributed thingie') to do it for you. Tables that contain balances (e.g. - credit balances, account balances, and such) would be a good example. You do NOT want to distribute the process of updating balances across some hideous-latency Great Divide link. That heads, of course, to application design, not to pure DB level activity... -- (format nil [EMAIL PROTECTED] cbbrowne cbbrowne.com) http://www.ntlug.org/~cbbrowne/advocacy.html Tools that are no good require more skill. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Postgresql replication
William Yu [EMAIL PROTECTED] writes: David Goodenough wrote: The most obvious one that does exactly this (generic multi-master replication) is Lotus Domino. It is not a relational DB, but not sufficiently far off to stop the analogy. Domino marks each document with a binary value which identifies the server (built from a hash of the server name and the time the DB was created) and a timestamp when it was last modified, and also each document (record) has an ID (like OIDs). More recent versions also do this at a field level to avoid conflicts and speed replication. When two servers replicate This system sounds ok for documents and general data that can always be revived via version control/history. But I can't see how this would work for financial transactions where you're dealing with money and bank accounts. Suppose I have $100 in my account. I decided to login to multiple servers and wire transfer $100 to another account on every server. And I hit submit exactly at the same time for every server so check. Sure they can resolve the conflict afterwards in terms of saying in terms of which transfer to kill off. But the fact is that my other account has that N x $100 already and I've just fleeced the bank. There are two pretty reasonable ways to address this: 1. Your application does not replicate balances; those are always computed locally. Only *transactions* are relayed. 2. Balance updates take place as a special kind of delta update where replication transfers around (old.balance - new.balance) instead of either of the values. Either of these leads to the nifty case where that bank discovers that you have overdrawn your account, and then takes whatever actions they deem are appropriate next. Such as billing you $29 for each overdraft transaction. You're hardly going to be retiring to the Bahamas on Nx$100, are you :-). -- select 'cbbrowne' || '@' || 'ntlug.org'; http://cbbrowne.com/info/internet.html Rules of the Evil Overlord #211. If my chief engineer displeases me, he will be shot, not imprisoned in the dungeon or beyond the traps he helped design. http://www.eviloverlord.com/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Postgresql replication
William Yu wrote: Another tidbit I'd like to add. What has helped a lot in implementing high-latency master-master replication writing our software with a business process model in mind where data is not posted directly to the final tables. Instead, users are generally allowed to enter anything -- could be incorrect, incomplete or the user does not have rights -- the data is still dumped into pending tables for people with rights to fix/review/approve later. Only after that process is the data posted to the final tables. (Good data entered on the first try still gets pended -- validation phase simply assumes the user who entered the data is also the one who fixed/reviewed/approved.) In this case, why have multimaster replication at all? Why not have the people who have rights to review this all write to the master database and have that replicated back? It seems like latency is not really an issue. Replication here is only going to complicate things. If it were me, I would be having my approval app pull data from *all* of the databases independently and not rely on the replication for this part. The replication could then be used to replicate *approved* data back to the slaves. There may be a delay for the user if he/she is working on a server that doesn't have rights to post his data. However, the pending-post model gets users used to the idea of (1) entering all data in large swoop and validating/posting it afterwards and (2) data can/will sit in pending for a period of time until it is acted upon with somebody/some server with the proper authority. Hence users aren't expecting results to pop up on the screen the moment they press the submit button. I still don't understand the purpose of replicating the pending data... Best Wishes, Chris Travers Metatron Technology Consulting ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Postgresql replication
Matt Miller wrote: http://pgfoundry.org/projects/pgcluster/ which provides syncronous multi-master clustering. He specifically said that pgcluster did not work for him because ...PGCluster requires that there be a load balancer and a replicator centrally located managing the cluster. If a network problem happens at the centralized location, it would bring down all locations completely. I think the load balancer is an optional component. Clients can connect either to a load balancer, or directly to a specific machine. The replicator surely is not optional, and must be centralized. Also, I think pgcluster can operate async. If the system runs in normal mode then the client gets a response as soon as the transaction is committed on the local machine. In reliable mode the client waits for the commit to happen on all machines. Interesting. I suppose whatever works for your application is the right answer for replication. PGCluster is query-based, right? I suppose the question would then be, would he rather use a trigger-based replication solution, like Slony, or a query-based replication solution. Regards, Jeff Davis ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] Postgresql replication
Hello, Currently we have only one database accessed by the headquarter and two branches but the performance in the branches is very poor and I was invited to discover a way to increase it. One possible solution is replicate the headquarter DB into the two branches. I read about slony-i, but then the replicated DBs will be read-only. Pgcluster is a sync solution and I think is not fited for us becase the replicated DBs will be located remotely and we have a lot of updates on the DBs. I think I´m looking for a master-slave assynchronous solution. I know pgReplicator can do it, but I think the project is not active any more. Are there another solutions? Thanks in advance! Reimer__Converse com seus amigos em tempo real com o Yahoo! Messenger http://br.download.yahoo.com/messenger/
Re: [GENERAL] Postgresql replication
Am Mittwoch, 24. August 2005 14:21 schrieb Carlos Henrique Reimer: One possible solution is replicate the headquarter DB into the two branches. I read about slony-i, but then the replicated DBs will be read-only. That's because it's a master-slave replication. If you could sync the slave back to the master it would be a master itself. I think I´m looking for a master-slave assynchronous solution. I know pgReplicator can do it, but I think the project is not active any more. But Slony does master/slave replication. Michael -- Michael Meskes Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED] Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL! ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Postgresql replication
[mailto:[EMAIL PROTECTED] Behalf Of Michael Meskes Am Mittwoch, 24. August 2005 14:21 schrieb Carlos Henrique Reimer: One possible solution is replicate the headquarter DB into the two branches. I read about slony-i, but then the replicated DBs will be read-only. That's because it's a master-slave replication. If you could sync the slave back to the master it would be a master itself. I think I´m looking for a master-slave assynchronous solution. I know pgReplicator can do it, but I think the project is not active any more. But Slony does master/slave replication. i think carlos is confused about master-slave vs multi-master. carlos, master-slave async is much easier than multi-master async. yes, pgreplicator can in theory do multi-master async in a restricted sort of way (the generalized multi-master async problem is fairly intractable), but you're right in that pgreplicator is not maintained anymore (and moreover it depends on a no-longer-maintained dialect of tcl.) if i had infinite time and energy, i'd be working on a reimplementation of pgreplicator in C, but i don't have either and i don't see anyone offering to pay me to do it, so it'll stay on the list of wanna-do projects for the time being. richard ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Postgresql replication
[EMAIL PROTECTED] (Carlos Henrique Reimer) writes: Currently we have only one database accessed by the headquarter and two branches but the performance in the branches is very poor and I was invited to discover a way to increase it. One possible solution is replicate the headquarter DB into the two branches. I read about slony-i, but then the replicated DBs will be read-only. Correct. Pgcluster is a sync solution and I think is not fited for us becase the replicated DBs will be located remotely and we have a lot of updates on the DBs. Unfortunately, pgcluster isn't much maintained anymore. I think I´m looking for a master-slave assynchronous solution. I know pgReplicator can do it, but I think the project is not active any more. Slony-I is a master/slave asynchronous replication system; if you already considered it unsuitable, then I see little likelihood of other systems with the same sorts of properties being suitable. What could conceivably be of use to you would be a *multimaster* asynchronous replication system. Unfortunately, multimaster *anything* is a really tough nut to crack. There is a Slony-II project ongoing that is trying to construct a more-or-less synchronous multimaster replication system (where part of the cleverness involves trying to get as much taking place in an asynchronous fashion as possible) that would almost certainly be of no use to your use case. The most successful multimaster asynchronous replication system that I am aware of is the PalmComputing PalmSync system. It would presumably be possible to use some of the components of Slony-I to construct a multimaster async replication system. A pre-requisite would be the creation of some form of distributed sequence which would try to minimize the conflicts that arise out of auto-generation of sequence numbers. But beyond that lies the larger challenge of conflict resolution. Slony-I, as a single-master system, does not need to address conflicts, as changes must be made on the master and propagate elsewhere. Synchronous multimaster systems address conflicts by detecting them when they occur and rejecting one or another of the conflicting transactions. Asynchronous multimaster systems require some sort of conflict management/resolution system for situations where tuples are being concurrently updated on multiple nodes. How that is managed is, well, troublesome :-(. The PalmSync approach is that if it finds conflicts, it duplicates records and leaves you, the user, to clean things up. That may not be suitable for every kind of application... -- (reverse (concatenate 'string gro.gultn @ enworbbc)) http://cbbrowne.com/info/slony.html ((LAMBDA (X) (X X)) (LAMBDA (X) (X X))) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Postgresql replication
I read some documents about replicationand realized that if you plan on using asynchronous replication, your application should be designed from the outset with that in mind because asynchronous replication is not something that can be easily added on after the fact. Am I right? Reimer __Converse com seus amigos em tempo real com o Yahoo! Messenger http://br.download.yahoo.com/messenger/
Re: [GENERAL] Postgresql replication
Carlos Henrique Reimer wrote: I read some documents about replicationand realized that if you plan on using asynchronous replication, your application should be designed from the outset with that in mind because asynchronousreplication is not something that can be easily added on after the fact. Am I right? certainly, if your goal is a pgreplicator style multi-master async, this is correct, as you have to make decisions about the direction of data flow, id generation, and conflict resolution up front. if you want slony-I style single master/multi slave, you don't have to do so much advancethinking as records are only being inserted into the system on the single master. richard
Re: [GENERAL] Postgresql replication
Chris Browne wrote: Slony-I is a master/slave asynchronous replication system; if you already considered it unsuitable, then I see little likelihood of other systems with the same sorts of properties being suitable. What could conceivably be of use to you would be a *multimaster* asynchronous replication system. Unfortunately, multimaster *anything* is a really tough nut to crack. In general that's a difficult problem, but in practice there may be a solution. For instance, perhaps the following configuration would be helpful: Make a database for each physical server, called db1 ... dbN. Let your logical tables in each database be table1 ... tableM. Now, for each logical tableX (where 1 = X = M), make N physical tables, tableX_1 ... tableX_N. Now, make a view called tableX that is the UNION of tableX_1 ... tableX_N (tableX is not a real table, it's just a logical table). Now, use Slony-I. For each dbY (where 1 = Y = N), make dbY a master for tableX_Y (for all X where 1 = X = M) and a slave for tableX_Z (for all X,Z where 1 = X = M, Z != Y). Now, use a rule that replaces all INSERTs to tableX (where 1 = X = M) on dbY (where 1 = Y = N) with INSERTs to tableX_Y. That was my attempt at being unambiguous. In general what I mean is that each database is master of one piece of a table, and slave to all the other pieces of that table, and then you have a view which is the union of those pieces. That view is the logical table. Then have a RULE which makes INSERTs go to the physical table for which that database is master. The advantages: if one machine goes down, the rest keep going, and merely miss the updates from that one site to that table. If one machine makes an insert to the table, it quickly propogates to the other machines and transparently becomes a part of the logical tables on those machines. The disadvantages: UPDATEs are difficult, and might end up with a complicated set of rules/procedures/triggers. You may have to program the application defensively in case the database is unable to update a remote database for various reasons (if the record to be updated is a part of a table for which another database is master). Also, since the solution is asynchronous, the databases may provide different results to the same query. In general, this solution does not account for all kinds of data constraints. The conflict resolution is very simplified because it's basically just the union of data. If that union could cause a constraint violation itself, this solution might not be right for you. For instance, let's say you're tracking video rentals, and store policy says that you only rent one video per person. However, maybe they go to store 1 and rent a video, and run to store 2 and rent a video before store 1 sends the INSERT record over to store 2. Now, when they finally do attempt to UNION the data for the view, you have an inconsistant state. Many applications can get by just fine by UNIONing the data like that, and if not, perhaps work around it. I hope this is helpful. Let me know if there's some reason my plan won't work. Regards, Jeff Davis ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Postgresql replication
Jeff Davis writes: I hope this is helpful. Let me know if there's some reason my plan won't work. look at the solution in pgreplicator. site ids are embedded in the id columns in the tables, so there only m tables, and a bit less insanity. That doesn't work with Slony-I unfortunately. I don't know much about pgreplicator, but if it does something similar to what I'm talking about, maybe it's a good thing to look into. it'd be an excellent thing to look into if it were in any way supported or maintained. it's a dead project (unfortunately.) richard ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Postgresql replication
Welty, Richard wrote: Jeff Davis writes: The disadvantages: one more: if you actually have m tables and n servers, you have m x n tables in reality, which is pretty miserable scaling behavior. i should think that rules, triggers, and embedded procedures would explode in complexity rather rapidly. i know i wouldn't want to administer one of these if there were a lot of sites. True, but in practice n will usually be fairly reasonable. In particular, his setup sounded like it would be only a few. Also, you're really talking about scalability of administration. I don't think performance will be significantly impacted. I hope this is helpful. Let me know if there's some reason my plan won't work. look at the solution in pgreplicator. site ids are embedded in the id columns in the tables, so there only m tables, and a bit less insanity. That doesn't work with Slony-I unfortunately. I don't know much about pgreplicator, but if it does something similar to what I'm talking about, maybe it's a good thing to look into. Regards, Jeff Davis ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly