Re: [HACKERS] Core team statement on replication in PostgreSQL
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Agreed. I realize why we are not zeroing those bytes (for performance), but can't we have the archiver zero those bytes before calling the 'archive_command'? The archiver doesn't know any more about where the end-of-data is than the archive_command does. Moreover, the archiver doesn't know whether the archive_command cares. I think the separate module is a fine solution. It should also be pointed out that the whole thing becomes uninteresting if we get real-time log shipping implemented. So I see absolutely no point in spending time integrating pg_clearxlogtail now. People doing PITR are still going to be saving these files, and for a long time, so I think this is still something we should try to address. Added to TODO: o Reduce PITR WAL file size by removing full page writes and by removing trailing bytes to improve compression -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Bruce Momjian wrote: Added to TODO: o Reduce PITR WAL file size by removing full page writes and by removing trailing bytes to improve compression If we remove full page writes, how does hint bit setting get propagated to the slave? -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Alvaro Herrera wrote: Bruce Momjian wrote: Added to TODO: o Reduce PITR WAL file size by removing full page writes and by removing trailing bytes to improve compression If we remove full page writes, how does hint bit setting get propagated to the slave? We would remove full page writes that are needed for crash recovery, but perhaps keep other full pages. -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Bruce Momjian wrote: Alvaro Herrera wrote: Bruce Momjian wrote: Added to TODO: o Reduce PITR WAL file size by removing full page writes and by removing trailing bytes to improve compression If we remove full page writes, how does hint bit setting get propagated to the slave? We would remove full page writes that are needed for crash recovery, but perhaps keep other full pages. How do you tell which is which? -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Alvaro Herrera wrote: Bruce Momjian wrote: Alvaro Herrera wrote: Bruce Momjian wrote: Added to TODO: o Reduce PITR WAL file size by removing full page writes and by removing trailing bytes to improve compression If we remove full page writes, how does hint bit setting get propagated to the slave? We would remove full page writes that are needed for crash recovery, but perhaps keep other full pages. How do you tell which is which? The WAL format would have to be modified to indicate which entries can be discarded. -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Mon, Jun 9, 2008 at 9:48 PM, in message [EMAIL PROTECTED], Greg Smith [EMAIL PROTECTED] wrote: On Mon, 9 Jun 2008, Tom Lane wrote: It should also be pointed out that the whole thing becomes uninteresting if we get real-time log shipping implemented. So I see absolutely no point in spending time integrating pg_clearxlogtail now. There are remote replication scenarios over a WAN (mainly aimed at disaster recovery) that want to keep a fairly updated database without putting too much traffic over the link. People in that category really want zeroed tail+compressed archives, but probably not the extra overhead that comes with shipping smaller packets in a real-time implementation. We ship the WAL files over a (relatively) slow WAN for disaster recovery purposes, and we would be fine with replacing our current techniques with real-time log shipping as long as: (1) We can do it asynchronously. (i.e., we don't have to wait for WAN latency to commit transactions.) (2) It can ship to multiple targets. (Management dictates that we have backups at the site of origin as well as our central site. A failure to replicate to one must not delay the other.) (3) It doesn't consume substantially more WAN bandwidth overall. A solution which fails to cover any of these leaves pg_clearxlogtail interesting to us. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Greg Smith [EMAIL PROTECTED] writes: On Mon, 9 Jun 2008, Tom Lane wrote: It should also be pointed out that the whole thing becomes uninteresting if we get real-time log shipping implemented. So I see absolutely no point in spending time integrating pg_clearxlogtail now. There are remote replication scenarios over a WAN (mainly aimed at disaster recovery) that want to keep a fairly updated database without putting too much traffic over the link. People in that category really want zeroed tail+compressed archives, but probably not the extra overhead that comes with shipping smaller packets in a real-time implementation. Instead of zeroing bytes and depending on compression why not just pass an extra parameter to the archive command with the offset to the logical end of data. The archive_command could just copy from the start to that point and not bother transferring the rest. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Gregory Stark wrote: Instead of zeroing bytes and depending on compression why not just pass an extra parameter to the archive command with the offset to the logical end of data. Because the archiver process doesn't have that information. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
All, For the slave to not interfere with the master at all, we would need to delay application of WAL files on each slave until visibility on that slave allows the WAL to be applied, but in that case we would have long-running transactions delay data visibility of all slave sessions. Right, but you could segregate out long-running queries to one slave server that could be further behind than the others. I still see having 2 different settings: Synchronous: XID visibility is pushed to the master. Maintains synchronous failover, and users are expected to run *1* master to *1* slave for most installations. Asynchronous: replication stops on the slave whenever minxid gets out of synch. Could have multiple slaves, but noticeable lag between master and slave. -- Josh Berkus PostgreSQL @ Sun San Francisco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Josh Berkus [EMAIL PROTECTED] wrote: I still see having 2 different settings: Synchronous: XID visibility is pushed to the master. Maintains synchronous failover, and users are expected to run *1* master to *1* slave for most installations. Asynchronous: replication stops on the slave whenever minxid gets out of synch. Could have multiple slaves, but noticeable lag between master and slave. I agree with you that we have sync/async option in log-shipping. Also, we could have another setting - synchronous-shipping and asynchronous-flushing. We won't lose transactions if both servers are down at once and can avoid delays to flush wal files into primary's disks. As for multiple slaves, we could have a cascading configuration; WAL receiver also delivers WAL records to other servers. I think it is simple that the postgres core has only one-to-one replication and multiple slaves are supported by 3rd party's WAL receivers. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Gurjeet Singh wrote: On Fri, May 30, 2008 at 10:40 AM, Tom Lane [EMAIL PROTECTED] wrote: But since you mention it: one of the plausible answers for fixing the vacuum problem for read-only slaves is to have the slaves push an xmin back upstream to the master to prevent premature vacuuming. The current design of pg_standby is utterly incapable of handling that requirement. So there might be an implementation dependency there, depending on how we want to solve that problem. I think it would be best to not make the slave interfere with the master's operations; that's only going to increase the operational complexity of such a solution. There could be multiple slaves following a master, some serving For the slave to not interfere with the master at all, we would need to delay application of WAL files on each slave until visibility on that slave allows the WAL to be applied, but in that case we would have long-running transactions delay data visibility of all slave sessions. -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Andreas 'ads' Scherbaum wrote: On Fri, 30 May 2008 16:22:41 -0400 (EDT) Greg Smith wrote: On Fri, 30 May 2008, Andreas 'ads' Scherbaum wrote: Then you ship 16 MB binary stuff every 30 second or every minute but you only have some kbyte real data in the logfile. Not if you use pg_clearxlogtail ( http://www.2ndquadrant.com/replication.htm ), which got lost in the giant March commitfest queue but should probably wander into contrib as part of 8.4. Yes, this topic was discussed several times in the past but to solve this it needs a patch/solution which is integrated into PG itself, not contrib. Agreed. I realize why we are not zeroing those bytes (for performance), but can't we have the archiver zero those bytes before calling the 'archive_command'? -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Bruce Momjian wrote: Agreed. I realize why we are not zeroing those bytes (for performance), but can't we have the archiver zero those bytes before calling the 'archive_command'? Perhaps make the zeroing user-settable. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Bruce Momjian [EMAIL PROTECTED] writes: Gurjeet Singh wrote: There could be multiple slaves following a master, some serving For the slave to not interfere with the master at all, we would need to delay application of WAL files on each slave until visibility on that slave allows the WAL to be applied, but in that case we would have long-running transactions delay data visibility of all slave sessions. Right, but you could segregate out long-running queries to one slave server that could be further behind than the others. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Bruce Momjian [EMAIL PROTECTED] writes: Agreed. I realize why we are not zeroing those bytes (for performance), but can't we have the archiver zero those bytes before calling the 'archive_command'? The archiver doesn't know any more about where the end-of-data is than the archive_command does. Moreover, the archiver doesn't know whether the archive_command cares. I think the separate module is a fine solution. It should also be pointed out that the whole thing becomes uninteresting if we get real-time log shipping implemented. So I see absolutely no point in spending time integrating pg_clearxlogtail now. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Mon, 9 Jun 2008, Tom Lane wrote: It should also be pointed out that the whole thing becomes uninteresting if we get real-time log shipping implemented. So I see absolutely no point in spending time integrating pg_clearxlogtail now. There are remote replication scenarios over a WAN (mainly aimed at disaster recovery) that want to keep a fairly updated database without putting too much traffic over the link. People in that category really want zeroed tail+compressed archives, but probably not the extra overhead that comes with shipping smaller packets in a real-time implementation. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Just for information. In terms of archive compression, I have archive log compression which will be found in http://pgfoundry.org/projects/pglesslog/ This feature is also included in NTT's synchronized log shipping replication presented in the last PGCon. 2008/6/10 Greg Smith [EMAIL PROTECTED]: On Mon, 9 Jun 2008, Tom Lane wrote: It should also be pointed out that the whole thing becomes uninteresting if we get real-time log shipping implemented. So I see absolutely no point in spending time integrating pg_clearxlogtail now. There are remote replication scenarios over a WAN (mainly aimed at disaster recovery) that want to keep a fairly updated database without putting too much traffic over the link. People in that category really want zeroed tail+compressed archives, but probably not the extra overhead that comes with shipping smaller packets in a real-time implementation. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- -- Koichi Suzuki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Jeff Davis [EMAIL PROTECTED] writes: On Wed, 2008-06-04 at 14:17 +0300, Heikki Linnakangas wrote: Would that also cover possible differences in page size, 32bit OS vs. 64bit OS, different timestamp flavour, etc. issues ? AFAIR, all these things can have an influence on how the data is written and possibly make the WAL incompatible with other postgres instances, even if the exact same version... These are already covered by the information in pg_control. Another thing that can change between systems is the collation behavior, which can corrupt indexes (and other bad things). Well, yes and no. It's entirely possible, for example, for a minor release of an OS to tweak the collation rules for a collation without changing the name. For the sake of argument they might just be fixing a bug in the collation rules. From the point of view of the OS that's a minor bug fix that they might not foresee causing data corruption problems. Pegging pg_control to a particular release of the OS would be pretty terrible though. I don't really see an out for this. But it's another roadblock to consider akin to not-really-immutable index expressions for any proposal which depends on re-finding index pointers :( -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Hello Andrew, Andrew Sullivan wrote: Yes. And silent as ever. :-) Are the slides of your PgCon talk available for download somewhere? BTW: up until recently, there was yet another mailing list: [EMAIL PROTECTED] It was less focused on hooks and got at least some traffic. :-) Are those mails still archived somewhere? Regards Markus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Stephen Denne wrote: Hannu Krosing wrote: The simplest form of synchronous wal shipping would not even need postgresql running on slave, just a small daemon which reports when wal blocks are a) received and b) synced to disk. While that does sound simple, I'd presume that most people would want the guarantee of the same version of postgresql installed wherever the logs are ending up, with the log receiver speaking the same protocol version as the log sender. I imagine that would be most easily achieved through using something like the continuously restoring startup mode of current postgresql. Hmm, WAL version compatibility is an interesting question. Most minor releases hasn't changed the WAL format, and it would be nice to allow running different minor versions in the master and slave in those cases. But it's certainly not unheard of to change the WAL format. Perhaps we should introduce a WAL version number, similar to catalog version? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Hmm, WAL version compatibility is an interesting question. Most minor releases hasn't changed the WAL format, and it would be nice to allow As I remember, high minor version should read all WALs from lowers, but it isn't true for opposite case and between different major versions. running different minor versions in the master and slave in those cases. But it's certainly not unheard of to change the WAL format. Perhaps we should introduce a WAL version number, similar to catalog version? Agree. Right now it only touches warm-stand-by servers, but introducing simple log-shipping and based on it replication will cause a lot of unobvious errors/bugs. Is it possible to use catalog version number as WAL version? -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Wed, 2008-06-04 at 11:13 +0300, Heikki Linnakangas wrote: Hmm, WAL version compatibility is an interesting question. Most minor releases hasn't changed the WAL format, and it would be nice to allow running different minor versions in the master and slave in those cases. But it's certainly not unheard of to change the WAL format. Perhaps we should introduce a WAL version number, similar to catalog version? Would that also cover possible differences in page size, 32bit OS vs. 64bit OS, different timestamp flavour, etc. issues ? AFAIR, all these things can have an influence on how the data is written and possibly make the WAL incompatible with other postgres instances, even if the exact same version... Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Csaba Nagy wrote: On Wed, 2008-06-04 at 11:13 +0300, Heikki Linnakangas wrote: Hmm, WAL version compatibility is an interesting question. Most minor releases hasn't changed the WAL format, and it would be nice to allow running different minor versions in the master and slave in those cases. But it's certainly not unheard of to change the WAL format. Perhaps we should introduce a WAL version number, similar to catalog version? Would that also cover possible differences in page size, 32bit OS vs. 64bit OS, different timestamp flavour, etc. issues ? AFAIR, all these things can have an influence on how the data is written and possibly make the WAL incompatible with other postgres instances, even if the exact same version... These are already covered by the information in pg_control. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Teodor Sigaev wrote: Is it possible to use catalog version number as WAL version? No, because we don't change the catalog version number in minor releases, even though we might change WAL format. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Wed, Jun 04, 2008 at 09:24:20AM +0200, Markus Schiltknecht wrote: Are the slides of your PgCon talk available for download somewhere? There weren't any slides, really (there were 4 that I put up in case the cases I was discussing needed back-references, but they didn't). Joshua tells me that I'm supposed to make the paper readable and put it up on Command Prompt's website, so I will soon. BTW: up until recently, there was yet another mailing list: [EMAIL PROTECTED] It was less focused on hooks and got at least some traffic. :-) Are those mails still archived somewhere? Unless whoever was operating that list moved it to pgfoundry, I doubt it (except on backups somewhere). A -- Andrew Sullivan [EMAIL PROTECTED] +1 503 667 4564 x104 http://www.commandprompt.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Heikki Linnakangas [EMAIL PROTECTED] writes: Hmm, WAL version compatibility is an interesting question. Most minor releases hasn't changed the WAL format, and it would be nice to allow running different minor versions in the master and slave in those cases. But it's certainly not unheard of to change the WAL format. Perhaps we should introduce a WAL version number, similar to catalog version? Yeah, perhaps. In the past we've changed the WAL page ID field for this; I'm not sure if that's enough or not. It does seem like a good idea to have a way to check that the slaves aren't trying to read a WAL version they don't understand. Also, it's possible that the WAL format doesn't change across a major update, but you still couldn't work with say an 8.4 master and an 8.3 slave, so maybe we need the catalog version ID in there too. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Wed, 2008-06-04 at 10:40 -0400, Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: Hmm, WAL version compatibility is an interesting question. Most minor releases hasn't changed the WAL format, and it would be nice to allow running different minor versions in the master and slave in those cases. But it's certainly not unheard of to change the WAL format. Perhaps we should introduce a WAL version number, similar to catalog version? Yeah, perhaps. In the past we've changed the WAL page ID field for this; I'm not sure if that's enough or not. It does seem like a good idea to have a way to check that the slaves aren't trying to read a WAL version they don't understand. Also, it's possible that the WAL format doesn't change across a major update, but you still couldn't work with say an 8.4 master and an 8.3 slave, so maybe we need the catalog version ID in there too. And something dependent on datetime being integer. We probably won't need to encode presence of user defined types, like PostGis , being present ? - Hannu -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Hannu Krosing [EMAIL PROTECTED] writes: On Wed, 2008-06-04 at 10:40 -0400, Tom Lane wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: Hmm, WAL version compatibility is an interesting question. Most minor releases hasn't changed the WAL format, and it would be nice to allow running different minor versions in the master and slave in those cases. But it's certainly not unheard of to change the WAL format. Perhaps we should introduce a WAL version number, similar to catalog version? Yeah, perhaps. In the past we've changed the WAL page ID field for this; I'm not sure if that's enough or not. It does seem like a good idea to have a way to check that the slaves aren't trying to read a WAL version they don't understand. Also, it's possible that the WAL format doesn't change across a major update, but you still couldn't work with say an 8.4 master and an 8.3 slave, so maybe we need the catalog version ID in there too. And something dependent on datetime being integer. This thread is getting out of hand, actually. Heikki's earlier comment about pg_control reminded me that we already have a unique system identifier stored in pg_control and check that against WAL headers. So I think we already have enough certainty that the master and slaves have the same pg_control and hence are the same for everything checked by pg_control. However, since by definition pg_control doesn't change in a minor upgrade, there isn't any easy way to enforce a rule like slaves must be same or newer minor version as the master. I'm not sure that we actually *want* to enforce such a rule, though. Most of the time, the other way around would work fine. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Wed, 2008-06-04 at 11:37 -0400, Tom Lane wrote: This thread is getting out of hand, actually. Agreed. We should start new threads for specific things. Please. However, since by definition pg_control doesn't change in a minor upgrade, there isn't any easy way to enforce a rule like slaves must be same or newer minor version as the master. I'm not sure that we actually *want* to enforce such a rule, though. Definitely don't want to prevent minor version mismatches. We want to be able to upgrade a standby, have it catch up with the master then switchover to the new version. Otherwise we'd have to take whole replicated system down to do minor upgrades/backouts. Ugh! -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Wed, 2008-06-04 at 14:17 +0300, Heikki Linnakangas wrote: Would that also cover possible differences in page size, 32bit OS vs. 64bit OS, different timestamp flavour, etc. issues ? AFAIR, all these things can have an influence on how the data is written and possibly make the WAL incompatible with other postgres instances, even if the exact same version... These are already covered by the information in pg_control. Another thing that can change between systems is the collation behavior, which can corrupt indexes (and other bad things). Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Jeff Davis [EMAIL PROTECTED] writes: On Wed, 2008-06-04 at 14:17 +0300, Heikki Linnakangas wrote: These are already covered by the information in pg_control. Another thing that can change between systems is the collation behavior, which can corrupt indexes (and other bad things). That is covered by pg_control, at least to the extent of forcing the same value of LC_COLLATE. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Wed, 2008-06-04 at 14:23 -0400, Tom Lane wrote: That is covered by pg_control, at least to the extent of forcing the same value of LC_COLLATE. But the same LC_COLLATE means different things on different systems. Even en_US means something different on Mac versus Linux. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Well, WAL format doesn't only depend on WAL itself, but also depend on each resource manager. If we introduce WAL format version identification, ISTM that we have to take care of the matching of resource manager in the master and the slave as well. 2008/6/4 Heikki Linnakangas [EMAIL PROTECTED]: Stephen Denne wrote: Hannu Krosing wrote: The simplest form of synchronous wal shipping would not even need postgresql running on slave, just a small daemon which reports when wal blocks are a) received and b) synced to disk. While that does sound simple, I'd presume that most people would want the guarantee of the same version of postgresql installed wherever the logs are ending up, with the log receiver speaking the same protocol version as the log sender. I imagine that would be most easily achieved through using something like the continuously restoring startup mode of current postgresql. Hmm, WAL version compatibility is an interesting question. Most minor releases hasn't changed the WAL format, and it would be nice to allow running different minor versions in the master and slave in those cases. But it's certainly not unheard of to change the WAL format. Perhaps we should introduce a WAL version number, similar to catalog version? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- -- Koichi Suzuki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Koichi Suzuki [EMAIL PROTECTED] writes: Well, WAL format doesn't only depend on WAL itself, but also depend on each resource manager. If we introduce WAL format version identification, ISTM that we have to take care of the matching of resource manager in the master and the slave as well. That seems a bit overdesigned. What are the prospects that two builds of the same Postgres version are going to have different sets of resource managers in them? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
If the version of the master and the slave is different and we'd still like to allow log shipping replication, we need a negotiation if WAL format for the two is compatible. I hope it is not in our scope and I'm worrying too much. 2008/6/5 Tom Lane [EMAIL PROTECTED]: Koichi Suzuki [EMAIL PROTECTED] writes: Well, WAL format doesn't only depend on WAL itself, but also depend on each resource manager. If we introduce WAL format version identification, ISTM that we have to take care of the matching of resource manager in the master and the slave as well. That seems a bit overdesigned. What are the prospects that two builds of the same Postgres version are going to have different sets of resource managers in them? regards, tom lane -- -- Koichi Suzuki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Mon, 2008-06-02 at 22:40 +0200, Andreas 'ads' Scherbaum wrote: On Mon, 02 Jun 2008 11:52:05 -0400 Chris Browne wrote: [EMAIL PROTECTED] (Andreas 'ads' Scherbaum) writes: On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote: Well, yes, but you do know about archive_timeout, right? No need to wait 2 hours. Then you ship 16 MB binary stuff every 30 second or every minute but you only have some kbyte real data in the logfile. This must be taken into account, especially if you ship the logfile over the internet (means: no high-speed connection, maybe even pay-per-traffic) to the slave. If you have that kind of scenario, then you have painted yourself into a corner, and there isn't anything that can be done to extract you from it. You are misunderstanding something. It's perfectly possible that you have a low-traffic database with changes every now and then. But you have to copy a full 16 MB logfile every 30 seconds or every minute just to have the slave up-to-date. To repeat my other post in this thread: Actually we can already do better than file-by-file by using pg_xlogfile_name_offset() which was added sometime in 2006. walmgr.py from SkyTools package for example does this to get no more than a few seconds failure window and it copies just the changed part of WAL to slave. pg_xlogfile_name_offset() was added just for this purpose - to enable WAL shipping scripts to query, where inside the logfile current write pointer is. It is not synchronous, but it can be made very close, within subsecond if you poll it frequently enough. --- Hannu -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Sun, Jun 01, 2008 at 01:43:22PM -0400, Tom Lane wrote: power to him. (Is the replica-hooks-discuss list still working?) But Yes. And silent as ever. :-) A -- Andrew Sullivan [EMAIL PROTECTED] +1 503 667 4564 x104 http://www.commandprompt.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Hannu Krosing wrote: The simplest form of synchronous wal shipping would not even need postgresql running on slave, just a small daemon which reports when wal blocks are a) received and b) synced to disk. While that does sound simple, I'd presume that most people would want the guarantee of the same version of postgresql installed wherever the logs are ending up, with the log receiver speaking the same protocol version as the log sender. I imagine that would be most easily achieved through using something like the continuously restoring startup mode of current postgresql. However variations on this kind of daemon can be used to perform testing, configuring it to work well, go slow, pause, not respond, disconnect, or fail in particular ways, emulating disk full, etc. Regards, Stephen Denne. -- At the Datamail Group we value teamwork, respect, achievement, client focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by replying immediately, destroy it and do not copy, disclose or use it in any way. The Datamail Group, through our GoGreen programme, is committed to environmental sustainability. Help us in our efforts by not printing this email. __ This email has been scanned by the DMZGlobal Business Quality Electronic Messaging Suite. Please see http://www.dmzglobal.com/dmzmessaging.htm for details. __ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Hi Hannu, Hi Hannu, On 6/1/08 2:14 PM, Hannu Krosing [EMAIL PROTECTED] wrote: As a consequence, I don¹t see how you can get around doing some sort of row-based replication like all the other databases. Is'nt WAL-base replication some sort of row-based replication ? Yes, in theory. However, there's a big difference between replicating physical WAL records and doing logical replication with SQL statements. Logical replication requires extra information to reconstruct primary keys. (Somebody tell me if this is already in the WAL; I'm learning the code as fast as possible but assuming for now it's not.) Now that people are starting to get religion on this issue I would strongly advocate a parallel effort to put in a change-set extraction API that would allow construction of comprehensive master/slave replication. Triggers. see pgQ's logtrigga()/logutrigga(). See slides for Marko Kreen's presentation at pgCon08. Thanks very much for the pointer. The slides look interesting. Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
[EMAIL PROTECTED] (Andreas 'ads' Scherbaum) writes: On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote: Well, yes, but you do know about archive_timeout, right? No need to wait 2 hours. Then you ship 16 MB binary stuff every 30 second or every minute but you only have some kbyte real data in the logfile. This must be taken into account, especially if you ship the logfile over the internet (means: no high-speed connection, maybe even pay-per-traffic) to the slave. If you have that kind of scenario, then you have painted yourself into a corner, and there isn't anything that can be done to extract you from it. Consider: If you have so much update traffic that it is too much to replicate via WAL-copying, why should we expect that other mechanisms *wouldn't* also overflow the connection? If you haven't got enough network bandwidth to use this feature, then nobody is requiring that you use it. It seems like a perfectly reasonable prerequisite to say this requires that you have enough bandwidth. -- (reverse (concatenate 'string ofni.secnanifxunil @ enworbbc)) http://www3.sympatico.ca/cbbrowne/ There's nothing worse than having only one drunk head. -- Zaphod Beeblebrox -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Thu, 2008-05-29 at 23:37 +0200, Mathias Brossard wrote: I pointed out that the NTT solution is synchronous because Tom said in the first part of his email that: In practice, simple asynchronous single-master-multiple-slave replication covers a respectable fraction of use cases, so we have concluded that we should allow such a feature to be included in the core project. ... and yet the most appropriate base technology for this is synchronous and maybe I should have also pointed out in my previous mail is that it doesn't support multiple slaves. I don't think that you need too many slaves in sync mode. Probably 1-st slave sync and others async from there on will be good enough. Also, as other have pointed out there are different interpretations of synchronous depending on wether the WAL data has reached the other end of the network connection, a safe disk checkpoint or the slave DB itself. Probably all DRBD-s levels ( A) data sent to network, B) data received, C) data written to disk) should be supported + C1) data replayed in slave DB. C1 meaning that it can be done in parallel with C) Then each DBA can set it up depending on what he trusts - network, slave's power supply or slaves' disk. Also, the case of slave failure should be addressed. I don't think that the best solution is halting all ops on master if slave/network fails. Maybe we should allow also a setup with 2-3 slaves, where operations can continue when at least 1 slave is syncing ? -- Hannu -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Mon, 02 Jun 2008 11:52:05 -0400 Chris Browne wrote: [EMAIL PROTECTED] (Andreas 'ads' Scherbaum) writes: On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote: Well, yes, but you do know about archive_timeout, right? No need to wait 2 hours. Then you ship 16 MB binary stuff every 30 second or every minute but you only have some kbyte real data in the logfile. This must be taken into account, especially if you ship the logfile over the internet (means: no high-speed connection, maybe even pay-per-traffic) to the slave. If you have that kind of scenario, then you have painted yourself into a corner, and there isn't anything that can be done to extract you from it. You are misunderstanding something. It's perfectly possible that you have a low-traffic database with changes every now and then. But you have to copy a full 16 MB logfile every 30 seconds or every minute just to have the slave up-to-date. Consider: If you have so much update traffic that it is too much to replicate via WAL-copying, why should we expect that other mechanisms *wouldn't* also overflow the connection? For some MB real data you copy several GB logfiles per day - that's a lot overhead, isn't it? If you haven't got enough network bandwidth to use this feature, then nobody is requiring that you use it. It seems like a perfectly reasonable prerequisite to say this requires that you have enough bandwidth. If you have a high-traffic database, then of course you need an other connection as if you only have a low-traffic or a mostly read-only database. But that's not the point. Copying an almost unused 16 MB WAL logfile is just overhead - especially because the logfile is not compressable very much because of all the leftovers from earlier use. Kind regards -- Andreas 'ads' Scherbaum German PostgreSQL User Group -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Hi Merlin, My point here is that with reasonably small extensions to the core you can build products that are a lot better than SLONY. Triggers do not cover DDL, among other issues, and it's debatable whether they are the best way to implement quorum policies like Google's semi-synchronous replication. As I mentioned separately this topic deserves another thread which I promise to start. It is of course possible to meet some of these needs with an appropriate client interface to WAL shipping. There's no a-priori reason why built-in PostgreSQL slaves need to be the only client. I would put a vote in for covering this possibility in the initial replication design. We are using a very similar approach in our own master/slave replication product. Thanks, Robert P.S., No offense intended to Jan Wieck et al. There are some pretty cool things in SLONY. On 5/29/08 8:16 PM, Merlin Moncure [EMAIL PROTECTED] wrote: On Thu, May 29, 2008 at 3:05 PM, Robert Hodges [EMAIL PROTECTED] wrote: Third, you can't stop with just this feature. (This is the BUT part of the post.) The use cases not covered by this feature area actually pretty large. Here are a few that concern me: 1.) Partial replication. 2.) WAN replication. 3.) Bi-directional replication. (Yes, this is evil but there are problems where it is indispensable.) 4.) Upgrade support. Aside from database upgrade (how would this ever really work between versions?), it would not support zero-downtime app upgrades, which depend on bi-directional replication tricks. 5.) Heterogeneous replication. 6.) Finally, performance scaling using scale-out over large numbers of replicas. I think it's possible to get tunnel vision on this-it's not a big requirement in the PG community because people don't use PG in the first place when they want to do this. They use MySQL, which has very good replication for performance scaling, though it's rather weak for availability. These type of things are what Slony is for. Slony is trigger based. This makes it more complex than log shipping style replication, but provides lots of functionality. wal shipping based replication is maybe the fastest possible solution...you are already paying the overhead so it comes virtually for free from the point of view of the master. mysql replication is imo nearly worthless from backup standpoint. merlin -- Robert Hodges, CTO, Continuent, Inc. Email: [EMAIL PROTECTED] Mobile: +1-510-501-3728 Skype: hodgesrm
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Sun, Jun 1, 2008 at 11:58 AM, Robert Hodges [EMAIL PROTECTED] wrote: Hi Merlin, My point here is that with reasonably small extensions to the core you can build products that are a lot better than SLONY. Triggers do not cover DDL, among other issues, and it's debatable whether they are the best way to implement quorum policies like Google's semi-synchronous replication. As I mentioned separately this topic deserves another thread which I promise to start. These issues are much discussed and well understood. At this point, the outstanding points of discussion are technical...how to make this thing work. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Merlin Moncure [EMAIL PROTECTED] writes: On Sun, Jun 1, 2008 at 11:58 AM, Robert Hodges [EMAIL PROTECTED] wrote: My point here is that with reasonably small extensions to the core you can build products that are a lot better than SLONY. These issues are much discussed and well understood. Well, what we know is that previous attempts to define replication hooks to be added to the core have died for lack of interest. Maybe Robert can start a new discussion that will actually get somewhere; if so, more power to him. (Is the replica-hooks-discuss list still working?) But that is entirely orthogonal to what is proposed in this thread, which is to upgrade the existing PITR support into a reasonably useful replication feature. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Thu, May 29, 2008 at 4:12 PM, Tom Lane [EMAIL PROTECTED] wrote: The Postgres core team met at PGCon to discuss a few issues, the largest of which is the need for simple, built-in replication for PostgreSQL. [...] We believe that the most appropriate base technology for this is 1 probably real-time WAL log shipping, as was demoed by NTT OSS at PGCon. We hope that such a feature can be completed for 8.4. Ideally this would be coupled with the ability to execute read-only queries on the slave servers, but we see technical difficulties that might prevent that from being completed before 8.5 or even further out. (The big problem is that long-running slave-side queries might still need tuples that are vacuumable on the master, and so replication of vacuuming actions would cause the slave's queries to deliver wrong answers.) Again, this will not replace Slony, pgPool, Continuent, Londiste, or other systems for many users, as it will be not be highly scalable nor support long-distance replication nor replicating less than an entire installation. But it is time to include a simple, reliable basic replication feature in the core system. Hello! I thought I would share a few thoughts of my own about the issue. I have a hands-on experience with Oracle and MySQL apart from PostgreSQL so I hope it will be a bit interesting. The former has a feature called physical standby, which looks quite like our WAL-shipping based replication. Simply archived logs are replayed on the standby database. A primary database and standby database are connected, and can stream the logs directly. They either copy the log when its finished (as we do now) or can do it in coninuous manner (as I hope we will be able to). It is possible to have a synchronous replication (where COMMIT on primary database succeeds when the data is safely stored on the standby database). I think such a feature would be a great advantage for PostgreSQL (where you cannot afford to loose any transactions). Their standby database is not accessible. It can be opened read-only, but during that time replication stops. So PostgreSQL having read-only and still replicating standby database would be great. The other method is logical standby which works by dissecting WAL-logs and recreating DDLs/DMLs from it. Never seen anyone use it. ;-) Then we have a mysql replication -- done by replaying actual DDLs/DMLs on the slaves. This approach has issues, most notably when slaves are highly loaded and lag behind the master -- so you end up with infrastructure to monitor lags and turn off slaves which lag too much. Also it is painful to setup -- you have to stop, copy, configure and run. * Back to PostgreSQL world As for PostgreSQL solutions we have a slony-I, which is great as long as you don't have too many people managing the database and/or your schema doesn't change too frequently. Perhaps it would be maintainable more easily if there would be to get DDLs (as DDL triggers or similar). Its main advantages for me is ability to prepare complex setups and easily add new slaves). The pgpool solution is quite nice but then again adding a new slave is not so easy. And being a filtering layer between client and server it feels a bit fragile (I know it is not, but then again it is harder to convince someone that yes it will work 100% right all the time). * How I would like PostgreSQL WAL-replication to evolve: First of all it would be great if a slave/standby would contact the master and maintain the state with it (tell it its xmin, request a log to stream, go online-streaming). Especially I hope that it should be possible to make a switchover (where the two databases exchange roles), and in this the direct connection between the two should help. In detail, I think it should go like this: * A slave database starts up, checks that it works as a replica (hopefully it would not be a postgresql.conf constant, but rather some file maintained by the database). * It would connect to the master database, tell where in the WAL it is now, and request a log N. * If log N is not available, request a log from external supplied script (so that it could be fetched from log archive repository somewhere, recovered from a backup tape, etc). * Continue asking, until we get to the logs which are available at master database. * Continue replaying until we get within max_allowed_replication_lag time, and open our slave for read-only queries. * If we start lagging too much perhaps close the read-only access to the database (perhaps configurable?). I think that replication should be easy to set up. I think our archive_command is quite easy, but many a person come with a lot of misconceptions how it works (and it takes time to explain them how it actually work, especially what is archive_command for, and that pg_start_backup() doesn't actually _do_ backup, but just tells PostgreSQL that backup is being done). Easy to setup and easy to switchover (change the
Re: [HACKERS] Core team statement on replication in PostgreSQL
David Fetter wrote: This part is a deal-killer. It's a giant up-hill slog to sell warm standby to those in charge of making resources available because the warm standby machine consumes SA time, bandwidth, power, rack space, etc., but provides no tangible benefit, and this feature would have exactly the same problem. IMHO, without the ability to do read-only queries on slaves, it's not worth doing this feature at all. That's not something that squares with my experience *at all*, which admitedly is entirely in investment banks. Business continuity is king, and in some places the warm standby rep from the database vendor is trusted more than block-level rep from the SAN vendor (though that may be changing to some extent in favour of the SAN). James -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Aidan Van Dyk wrote: The whole single-threaded WAL replay problem is going to rear it's ugly head here too, and mean that a slave *won't* be able to keep up with a busy master if it's actually trying to apply all the changes in real-time. Is there a reason to commit at the same points that the master committed? Wouldn't relaxing that mean that at least you would get 'big' commits and some economy of scale? It might not be too bad. All I can say is that Sybase warm standby is useful, even though the rep for an update that changes a hundred rows is a hundred updates keyed on primary key, which is pretty sucky in terms of T-SQL performance. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Thu, 2008-05-29 at 13:37 -0400, Tom Lane wrote: David Fetter [EMAIL PROTECTED] writes: On Thu, May 29, 2008 at 08:46:22AM -0700, Joshua D. Drake wrote: The only question I have is... what does this give us that PITR doesn't give us? It looks like a wrapper for PITR to me, so the gain would be ease of use. A couple of points about that: * Yeah, ease of use is a huge concern here. We're getting beat up because people have to go find a separate package (and figure out which one they want), install it, learn how to use it, etc. It doesn't help that the most mature package is Slony which is, um, not very novice-friendly or low-admin-complexity. I personally got religion on this about two months ago when Red Hat switched their bugzilla from Postgres to MySQL because the admins didn't want to deal with Slony any more. People want simple. * The proposed approach is trying to get to real replication incrementally. Getting rid of the loss window involved in file-by-file log shipping is step one, Actually we can already do better than file-by-file by using pg_xlogfile_name_offset() which was added sometime in 2006. SkyTools for example does this to get no more than a few seconds failure window. Doing this synchronously would be of course better. probably we should use the same modes/protocols as DRBD when determining when a sync wal write is done quote from http://www.slackworks.com/~dkrovich/DRBD/usingdrbdsetup.html#AEN76 Table 1. DRBD Protocols Protocol Description A A write operation is complete as soon as the data is written to disk and sent to the network. B A write operation is complete as soon as a reception acknowledgement arrives. C A write operation is complete as soon as a write acknowledgement arrives. There are also additional paramaters you can pass to the disk and net options. See the drbdsetup man page for additional information /end quote and I suspect that step two is going to be fixing performance issues in WAL replay to ensure that slaves can keep up. After that we'd start thinking about how to let slaves run read-only queries. But even without read-only queries, this will be a useful improvement for HA/backup scenarios. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Fri, 2008-05-30 at 15:16 -0400, Robert Treat wrote: On Friday 30 May 2008 01:10:20 Tom Lane wrote: Greg Smith [EMAIL PROTECTED] writes: I fully accept that it may be the case that it doesn't make technical sense to tackle them in any order besides sync-read-only slaves because of dependencies in the implementation between the two. Well, it's certainly not been my intention to suggest that no one should start work on read-only-slaves before we finish the other part. The point is that I expect the log shipping issues will be done first because they're easier, and it would be pointless to not release that feature if we had it. But since you mention it: one of the plausible answers for fixing the vacuum problem for read-only slaves is to have the slaves push an xmin back upstream to the master to prevent premature vacuuming. The current design of pg_standby is utterly incapable of handling that requirement. So there might be an implementation dependency there, depending on how we want to solve that problem. Sure, but whose to say that after synchronous wal shipping is finished it wont need a serious re-write due to new needs from the hot standby feature. I think going either way carries some risk. The simplest form of synchronous wal shipping would not even need postgresql running on slave, just a small daemon which reports when wal blocks are a) received and b) synced to disk. This setup would just guarantee no data loss on single machine failure. form there on you could add various features, including support for both switchover and failover, async replication to multiple slaves, etc. the only thing that needs anything additional from slave wal-receiving daemon is when you want the kind of wal-sync which would guarantee that read-only query on slave issued after commit returns from master sees latest data. for this kinds of guarantees you need at least feedback about wal-replay, but possibly also shared transaction numbers and shared snapshots, to be sure that OLTP type queries see the latest and OLAP queries are not denied seeing VACUUMED on master. -- Hannu -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Thu, 2008-05-29 at 12:05 -0700, Robert Hodges wrote: Hi everyone, First of all, I’m absolutely delighted that the PG community is thinking seriously about replication. Second, having a solid, easy-to-use database availability solution that works more or less out of the box would be an enormous benefit to customers. Availability is the single biggest problem for customers in my experience and as other people have commented the alternatives are not nice. It’s an excellent idea to build off an existing feature —PITR is already pretty useful and the proposed features are solid next steps. The fact that it does not solve all problems is not a drawback but means it’s likely to get done in a reasonable timeframe. Third, you can’t stop with just this feature. (This is the BUT part of the post.) The use cases not covered by this feature area actually pretty large. Here are a few that concern me: 1.) Partial replication. 2.) WAN replication. 1.) 2.) are better done asunc, the domain of Slony-I/Londiste 3.) Bi-directional replication. (Yes, this is evil but there are problems where it is indispensable.) Sure, it is also a lot harder and always has several dimensions (performanse/availability7locking) which play against each other 4.) Upgrade support. Aside from database upgrade (how would this ever really work between versions?), it would not support zero-downtime app upgrades, which depend on bi-directional replication tricks. Or you could use zero-downtime app upgrades, which don't depend on this :P 5.) Heterogeneous replication. 6.) Finally, performance scaling using scale-out over large numbers of replicas. I think it’s possible to get tunnel vision on this—it’s not a big requirement in the PG community because people don’t use PG in the first place when they want to do this. They use MySQL, which has very good replication for performance scaling, though it’s rather weak for availability. Again, doing scale-out over large number of replicas should either be async or for sync use some broadcast channel to all slaves (and still be a performance problem on master, as it has to wait for slowest slave). As a consequence, I don’t see how you can get around doing some sort of row-based replication like all the other databases. Is'nt WAL-base replication some sort of row-based replication ? Now that people are starting to get religion on this issue I would strongly advocate a parallel effort to put in a change-set extraction API that would allow construction of comprehensive master/slave replication. Triggers. see pgQ's logtrigga()/logutrigga(). See slides for Marko Kreen's presentation at pgCon08. (Another approach would be to make it possible for third party apps to read the logs and regenerate SQL.) which logs ? WAL or SQL command logs ? There are existing models for how to do change set extraction; we have done it several times at my company already. There are also research projects like GORDA that have looked fairly comprehensively at this problem. pgQ with its triggers does a pretty good job of change-set extraction. -- Hannu -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Le vendredi 30 mai 2008, Dimitri Fontaine a écrit : This way, no need to switch IP addresses, the clients just connect as usual and get results back and do not have to know whether the host they're qerying against is a slave or a master. This level of smartness is into -core. Oh, and if you want clients to connect to a single IP and hit either the master or the slave with some weights to choose one or the other, and a way to remove from pool on failure etc, I think using haproxy in TCP mode would do it. HaProxy is really nice for this purpose. http://haproxy.1wt.eu/ Regards, -- dim signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Fri, May 30, 2008 at 6:47 PM, Andreas 'ads' Scherbaum [EMAIL PROTECTED] wrote: On Fri, 30 May 2008 17:05:57 -0400 Andrew Dunstan wrote: Andreas 'ads' Scherbaum wrote: On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote: Well, yes, but you do know about archive_timeout, right? No need to wait 2 hours. Then you ship 16 MB binary stuff every 30 second or every minute but you only have some kbyte real data in the logfile. This must be taken into account, especially if you ship the logfile over the internet (means: no high-speed connection, maybe even pay-per-traffic) to the slave. Sure there's a price to pay. But that doesn't mean the facility doesn't exist. And I rather suspect that most of Josh's customers aren't too concerned about traffic charges or affected by such bandwidth restrictions. Certainly, none of my clients are, and they aren't in the giant class. Shipping a 16Mb file, particularly if compressed, every minute or so, is not such a huge problem for a great many commercial users, and even many domestic users. The real problem is not the 16 MB, the problem is: you can't compress this file. If the logfile is rotated it still contains all the old binary data which is not a good starter for compression. Using bzip2 in my archive_command script, my WAL files are normally compressed to between 2MB and 5MB, depending on the write load (larger, and more of them, in the middle of the day). bzip2 compression is more expensive and rotated WAL files are not particularly compressable to be sure, but due to (and given) the nature of the data bzip2 works pretty well, and much better than gzip. So you may have some kB changes in the wal logfile every minute but you still copy 16 MB data. Sure, it's not so much - but if you rotate a logfile every minute this still transfers 16*60*24 = ~23 GB a day. I archived 1965 logs yesterday on one instance of my app totalling 8.5GB ... not to bad, really. -- Mike Rylander | VP, Research and Design | Equinox Software, Inc. / The Evergreen Experts | phone: 1-877-OPEN-ILS (673-6457) | email: [EMAIL PROTECTED] | web: http://www.esilibrary.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Sat, May 31, 2008 at 2:18 AM, Mike Rylander [EMAIL PROTECTED] wrote: On Fri, May 30, 2008 at 6:47 PM, Andreas 'ads' Scherbaum [EMAIL PROTECTED] wrote: On Fri, 30 May 2008 17:05:57 -0400 Andrew Dunstan wrote: Andreas 'ads' Scherbaum wrote: On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote: Well, yes, but you do know about archive_timeout, right? No need to wait 2 hours. Then you ship 16 MB binary stuff every 30 second or every minute but you only have some kbyte real data in the logfile. This must be taken into account, especially if you ship the logfile over the internet (means: no high-speed connection, maybe even pay-per-traffic) to the slave. Sure there's a price to pay. But that doesn't mean the facility doesn't exist. And I rather suspect that most of Josh's customers aren't too concerned about traffic charges or affected by such bandwidth restrictions. Certainly, none of my clients are, and they aren't in the giant class. Shipping a 16Mb file, particularly if compressed, every minute or so, is not such a huge problem for a great many commercial users, and even many domestic users. The real problem is not the 16 MB, the problem is: you can't compress this file. If the logfile is rotated it still contains all the old binary data which is not a good starter for compression. Using bzip2 in my archive_command script, my WAL files are normally compressed to between 2MB and 5MB, depending on the write load (larger, and more of them, in the middle of the day). bzip2 compression is more expensive and rotated WAL files are not particularly compressable to be sure, but due to (and given) the nature of the data bzip2 works pretty well, and much better than gzip. Compression especially is going to negate one of the big advantages of wal shipping, namely that it is cheap investment in terms of load to the main. A gigabit link can ship a lot of log files, you can always bond and 10gige is coming. IMO the key trick is to make sure you don't send the log file more than once from the same source...i.e cascading relay. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Merlin Moncure wrote: On Sat, May 31, 2008 at 2:18 AM, Mike Rylander [EMAIL PROTECTED] wrote: On Fri, May 30, 2008 at 6:47 PM, Andreas 'ads' Scherbaum Compression especially is going to negate one of the big advantages of wal shipping, namely that it is cheap investment in terms of load to the main. A gigabit link can ship a lot of log files, you can always Who has a gigabit link between Dallas and Atlanta? That is the actual problem here. Switch to Switch compression is a waste of time (if you aren't running GiGE, what are you doing???). Sincerely, Joshua D. Drake -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Fri, May 30, 2008 at 10:40 AM, Tom Lane [EMAIL PROTECTED] wrote: But since you mention it: one of the plausible answers for fixing the vacuum problem for read-only slaves is to have the slaves push an xmin back upstream to the master to prevent premature vacuuming. The current design of pg_standby is utterly incapable of handling that requirement. So there might be an implementation dependency there, depending on how we want to solve that problem. I think it would be best to not make the slave interfere with the master's operations; that's only going to increase the operational complexity of such a solution. There could be multiple slaves following a master, some serving data-warehousing queries, some for load-balancing reads, some others just for disaster recovery, and then some just to mitigate human errors by re-applying the logs with a delay. I don't think any one installation would see all of the above mentioned scenarios, but we need to take care of multiple slaves operating off of a single master; something similar to cascaded Slony-I. My two cents. Best regards, -- [EMAIL PROTECTED] [EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com EnterpriseDB http://www.enterprisedb.com Mail sent from my BlackLaptop device
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Thu, May 29, 2008 at 01:58:34PM -0700, David Fetter wrote: If people on core had come to the idea that we needed to build in replication *before* 8.3 came out, they certainly didn't announce it. Now is a great time to mention this because it gives everybody time to: 1. Come to a consensus on what the out-of-the-box replication should be, and 2. Build, test and debug whatever the consensus out-of-the-box replication turns out to be. None of that is an argument for why this has to go in 8.4. I argued in Ottawa that the idea that you have to plan a feature for _the next release_ is getting less tenable with each release. This is because major new features for Postgres are now often big and complicated. The days of big gains from single victories are mostly over (though there are exceptions, like HOT). Postgres is already mature. As for the middle-aged person with a mortgage, longer-term planning is simply a necessary part of life now. There are two possibilities here. One is to have huge releases on much longer timetables. I think this is unsustainable in a free project, because people will get bored and go away if they don't get to use the results of their work in a reasonably short time frame. The other is to accept that sometimes, planning and development for new features will have to start a long time before actual release -- maybe planning and some coding for 2 releases out. That allows large features like the one we're discussing to be developed responsibly without making everything else wait for it. A -- Andrew Sullivan [EMAIL PROTECTED] +1 503 667 4564 x104 http://www.commandprompt.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On 5/30/08, Gurjeet Singh [EMAIL PROTECTED] wrote: On Fri, May 30, 2008 at 10:40 AM, Tom Lane [EMAIL PROTECTED] wrote: But since you mention it: one of the plausible answers for fixing the vacuum problem for read-only slaves is to have the slaves push an xmin back upstream to the master to prevent premature vacuuming. The current design of pg_standby is utterly incapable of handling that requirement. So there might be an implementation dependency there, depending on how we want to solve that problem. I think it would be best to not make the slave interfere with the master's operations; that's only going to increase the operational complexity of such a solution. I disagree - it's better to consider syncronized WAL-slaves as equal to master, so having queries there affect master is ok. You need to remeber this solution tries not to replace 100-node Slony-I setups. You can run sanity checks on slaves or use them to load-balance read-only OLTP queries, but not random stuff. There could be multiple slaves following a master, some serving data-warehousing queries, some for load-balancing reads, some others just for disaster recovery, and then some just to mitigate human errors by re-applying the logs with a delay. To run warehousing queries you better use Slony-I / Londiste. For warehousring you want different / more indexes on tables anyway, so I think it's quite ok to say don't do it for complex queries on WAL-slaves. I don't think any one installation would see all of the above mentioned scenarios, but we need to take care of multiple slaves operating off of a single master; something similar to cascaded Slony-I. Again, the synchronized WAL replication is not generic solution. Use Slony/Londiste if you want to get totally independent slaves. Thankfully the -core has set concrete and limited goals, that means it is possible to see working code in reasonable time. I think that should apply to read-only slaves too. If we try to make it handle any load, it will not be finished in any time. Now if we limit the scope I've seen 2 variants thus far: 1) Keep slave max in sync, let the load there affect master (xmin). - Slave can be used to load-balance OLTP load - Slave should not be used for complex queries. 2) If long query is running, let slave lag (avoid applying WAL data). - Slave cannot be used to load-balance OLTP load - Slave can be used for complex queries (although no new indexes or temp tables can be created). I think 1) is more important (and more easily implementable) case. For 2) we already have solutions (Slony/Londiste/Bucardo, etc) so there is no point to make effort to solve this here. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: On Fri, 2008-05-30 at 12:31 +0530, Gurjeet Singh wrote: On Fri, May 30, 2008 at 10:40 AM, Tom Lane [EMAIL PROTECTED] wrote: But since you mention it: one of the plausible answers for fixing the vacuum problem for read-only slaves is to have the slaves push an xmin back upstream to the master to prevent premature vacuuming. I think it would be best to not make the slave interfere with the master's operations; that's only going to increase the operational complexity of such a solution. We ruled that out as the-only-solution a while back. It does have the beauty of simplicity, so it may exist as an option or possibly the only way, for 8.4. Yeah. The point is that it's fairly clear that we could make that work. A solution that doesn't impact the master at all would be nicer, but it's not at all clear to me that one is possible, unless we abandon WAL-shipping as the base technology. Quite. Before we start ruling things out let's know what we think we can actually do. I hope that NTT will release their code ASAP so we will have a better idea of what we have and what we need. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Simon Riggs [EMAIL PROTECTED] writes: On Fri, 2008-05-30 at 12:31 +0530, Gurjeet Singh wrote: On Fri, May 30, 2008 at 10:40 AM, Tom Lane [EMAIL PROTECTED] wrote: But since you mention it: one of the plausible answers for fixing the vacuum problem for read-only slaves is to have the slaves push an xmin back upstream to the master to prevent premature vacuuming. I think it would be best to not make the slave interfere with the master's operations; that's only going to increase the operational complexity of such a solution. We ruled that out as the-only-solution a while back. It does have the beauty of simplicity, so it may exist as an option or possibly the only way, for 8.4. Yeah. The point is that it's fairly clear that we could make that work. A solution that doesn't impact the master at all would be nicer, but it's not at all clear to me that one is possible, unless we abandon WAL-shipping as the base technology. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Hi Tom, Thanks for the reasoned reply. As you saw from point #2 in my comments, I think you should do this feature. I hope this answers Josh Berkus' concern about my comments. You make a very interesting comment which seems to go to the heart of this design approach: About the only thing that would make me want to consider row-based replication in core would be if we determine that read-only slave queries are impractical atop a WAL-log-shipping implementation. It's possible I'm misunderstanding some of the implementation issues, but it is striking that the detailed responses to your proposal list a number of low-level dependencies between master and slave states when replicating WAL records. It appears that you are designing a replication mechanism that works effectively between a master and a relatively small number of nearby slaves. This is clearly an important use case but it also seems clear that the WAL approach is not a general-purpose approach to replication. In other words, you'll incrementally get to that limited end point I describe. This will still leave a lot to be desired on read scaling, not to mention many other cases. Hence my original comments. However, rather than harp on that further I will open up a separate thread to describe a relatively small set of extensions to PostgreSQL that would be enabling for a wide range of replication applications. Contrary to popular opinion these extensions are actually well understood at the theory level and have been implemented as prototypes as well as in commercial patches multiple times in different databases. Those of us who are deeply involved in replication deserve just condemnation for not stepping up and getting our thoughts out on the table. Meanwhile, I would be interested in your reaction to these thoughts on the scope of the real-time WAL approach. There's obviously tremendous interest in this feature. A general description that goes beyond the NTT slides would be most helpful for further discussions. Cheers, Robert P.s., The NTT slides were really great. Takahiro and Masao deserve congratulations on an absolutely first-rate presentation. On 5/29/08 9:09 PM, Tom Lane [EMAIL PROTECTED] wrote: Andrew Sullivan [EMAIL PROTECTED] writes: On Thu, May 29, 2008 at 12:05:18PM -0700, Robert Hodges wrote: people are starting to get religion on this issue I would strongly advocate a parallel effort to put in a change-set extraction API that would allow construction of comprehensive master/slave replication. You know, I gave a talk in Ottawa just last week about how the last effort to develop a comprehensive API for replication failed. Indeed, core's change of heart on this issue was largely driven by Andrew's talk and subsequent discussion. We had more or less been waiting for the various external replication projects to tell us what they wanted in this line, and it was only the realization that no such thing was likely to happen that forced us to think seriously about what could be done within the core project. As I said originally, we have no expectation that the proposed features will displace the existing replication projects for high end replication problems ... and I'd characterize all of Robert's concerns as high end problems. We are happy to let those be solved outside the core project. About the only thing that would make me want to consider row-based replication in core would be if we determine that read-only slave queries are impractical atop a WAL-log-shipping implementation. Which could happen; in fact I think that's the main risk of the proposed development plan. But I also think that the near-term steps of the plan are worth doing anyway, for various other reasons, and so we won't be out too much effort if the plan fails. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Fri, May 30, 2008 at 9:31 AM, Marko Kreen [EMAIL PROTECTED] wrote: On 5/30/08, Gurjeet Singh [EMAIL PROTECTED] wrote: I think it would be best to not make the slave interfere with the master's operations; that's only going to increase the operational complexity of such a solution. I disagree - it's better to consider syncronized WAL-slaves as equal to master, so having queries there affect master is ok. You need to remeber this solution tries not to replace 100-node Slony-I setups. You can run sanity checks on slaves or use them to load-balance read-only OLTP queries, but not random stuff. There could be multiple slaves following a master, some serving data-warehousing queries, some for load-balancing reads, some others just for disaster recovery, and then some just to mitigate human errors by re-applying the logs with a delay. To run warehousing queries you better use Slony-I / Londiste. For warehousring you want different / more indexes on tables anyway, so I think it's quite ok to say don't do it for complex queries on WAL-slaves. I don't think any one installation would see all of the above mentioned scenarios, but we need to take care of multiple slaves operating off of a single master; something similar to cascaded Slony-I. Again, the synchronized WAL replication is not generic solution. Use Slony/Londiste if you want to get totally independent slaves. I strongly agree with Gurjeet. The warm standby replication mechanism is pretty simple and is wonderfully flexible with the one big requirement that your clusters have to be mirrors of each other. Synchronous wal replication obviously needs some communication channel from the slave back to the master. Hopefully, it will be possible to avoid this for asynchronous shipping. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Fri, 2008-05-30 at 11:30 -0400, Andrew Dunstan wrote: Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: On Fri, 2008-05-30 at 12:31 +0530, Gurjeet Singh wrote: On Fri, May 30, 2008 at 10:40 AM, Tom Lane [EMAIL PROTECTED] wrote: But since you mention it: one of the plausible answers for fixing the vacuum problem for read-only slaves is to have the slaves push an xmin back upstream to the master to prevent premature vacuuming. I think it would be best to not make the slave interfere with the master's operations; that's only going to increase the operational complexity of such a solution. We ruled that out as the-only-solution a while back. It does have the beauty of simplicity, so it may exist as an option or possibly the only way, for 8.4. Yeah. The point is that it's fairly clear that we could make that work. A solution that doesn't impact the master at all would be nicer, but it's not at all clear to me that one is possible, unless we abandon WAL-shipping as the base technology. Quite. Before we start ruling things out let's know what we think we can actually do. Let me re-phrase: I'm aware of that possibility and believe we can and could do it for 8.4. My assessment is that people won't find it sufficient and I am looking at other alternatives also. There may be a better one possible for 8.4, there may not. Hence I've said something in 8.4, something better later. There is no need to decide that is the only way forward, yet. I hope and expect to put some of these ideas into a more concrete form, but this has not yet happened. Nothing has slipped, not having any trouble getting on with it, just that my plans were to not start it yet. I think having a detailed design ready for review by September commit fest is credible. I hope that NTT will release their code ASAP so we will have a better idea of what we have and what we need. That has very little to do with Hot Standby, though there could be patch conflicts, which is why I'm aiming to get WAL streaming done first. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Fri, 2008-05-30 at 12:31 +0530, Gurjeet Singh wrote: On Fri, May 30, 2008 at 10:40 AM, Tom Lane [EMAIL PROTECTED] wrote: But since you mention it: one of the plausible answers for fixing the vacuum problem for read-only slaves is to have the slaves push an xmin back upstream to the master to prevent premature vacuuming. The current design of pg_standby is utterly incapable of handling that requirement. So there might be an implementation dependency there, depending on how we want to solve that problem. I think it would be best to not make the slave interfere with the master's operations; that's only going to increase the operational complexity of such a solution. There could be multiple slaves following a master, some serving data-warehousing queries, some for load-balancing reads, some others just for disaster recovery, and then some just to mitigate human errors by re-applying the logs with a delay. Agreed. We ruled that out as the-only-solution a while back. It does have the beauty of simplicity, so it may exist as an option or possibly the only way, for 8.4. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Thu, 29 May 2008 09:22:26 -0700 Steve Atkins wrote: On May 29, 2008, at 9:12 AM, David Fetter wrote: Either one of these would be great, but something that involves machines that stay useless most of the time is just not going to work. I have customers who are thinking about warm standby functionality, and the only thing stopping them deploying it is complexity and maintenance, not the cost of the HA hardware. If trivial-to-deploy replication that didn't offer read-only access of the slaves were available today I'd bet that most of them would be using it. Sure, have a similar customer. They are right now using a set of Perl-scripts which ship the logfiles to the slave, take care of the status, apply the logfiles, validate checksums ect ect. The whole thing works very well in combination with RedHat cluster software, but it took several weeks to implement the current solution. Not everyone wants to spend the time and the manpower to implement a simple replication. Kind regards -- Andreas 'ads' Scherbaum German PostgreSQL User Group -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Thu, 29 May 2008 18:29:01 -0400 Tom Lane wrote: Dimitri Fontaine [EMAIL PROTECTED] writes: While at it, would it be possible for the simple part of the core team statement to include automatic failover? No, I think it would be a useless expenditure of energy. Failover includes a lot of things that are not within our purview: switching IP addresses to point to the new server, some kind of STONITH solution to keep the original master from coming back to life, etc. Moreover there are already projects/products concerned with those issues. True words. Failover is not and should not be part of PostgreSQL. But PG can help the failover solution, as example: an easy-to-use interface about the current slave status comes into my mind. Other ideas might also be possible. It might be useful to document where to find solutions to that problem, but we can't take it on as part of core Postgres. Ack Kind regards -- Andreas 'ads' Scherbaum German PostgreSQL User Group -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote: Well, yes, but you do know about archive_timeout, right? No need to wait 2 hours. Then you ship 16 MB binary stuff every 30 second or every minute but you only have some kbyte real data in the logfile. This must be taken into account, especially if you ship the logfile over the internet (means: no high-speed connection, maybe even pay-per-traffic) to the slave. Kind regards -- Andreas 'ads' Scherbaum German PostgreSQL User Group -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Fri, 30 May 2008, Andreas 'ads' Scherbaum wrote: Then you ship 16 MB binary stuff every 30 second or every minute but you only have some kbyte real data in the logfile. Not if you use pg_clearxlogtail ( http://www.2ndquadrant.com/replication.htm ), which got lost in the giant March commitfest queue but should probably wander into contrib as part of 8.4. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Thu, May 29, 2008 at 7:42 PM, Tom Lane [EMAIL PROTECTED] wrote: The big problem is that long-running slave-side queries might still need tuples that are vacuumable on the master, and so replication of vacuuming actions would cause the slave's queries to deliver wrong answers. Another issue with read-only slaves just popped up in my head. How do we block the readers on the slave while it is replaying an ALTER TABLE or similar command that requires Exclusive lock and potentially alters the table's structure. Or does the WAL replay already takes an x-lock on such a table? Best regards, -- [EMAIL PROTECTED] [EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com EnterpriseDB http://www.enterprisedb.com Mail sent from my BlackLaptop device
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Fri, 2008-05-30 at 11:12 -0700, Robert Hodges wrote: This is clearly an important use case but it also seems clear that the WAL approach is not a general-purpose approach to replication. I think we cannot make such a statement yet, if ever. I would note that log-based replication is now the mainstay of commercial database replication techniques for loosely-coupled groups of servers. It would seem strange to assume that it should not be good for us too, simply because we know it to be difficult. IMHO the project has a pretty good track record of delivering functionality that looked hard at first glance. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Thursday 29 May 2008 22:59:21 Merlin Moncure wrote: On Thu, May 29, 2008 at 9:26 PM, Josh Berkus [EMAIL PROTECTED] wrote: I fully accept that it may be the case that it doesn't make technical sense to tackle them in any order besides sync-read-only slaves because of dependencies in the implementation between the two. If that's the case, it would be nice to explicitly spell out what that was to deflect criticism of the planned prioritization. There's a very simple reason to prioritize the synchronous log shipping first; NTT may open source their solution and we'll get it a lot sooner than the other components. That's a good argument. I just read the NTT document and the stuff looks fantastic. You've convinced me... It would be a better argument if the NTT guys hadn't said that they estimated 6 months time before the code would be released, which puts us beyond 8.4. Now it is possible that the time frame could be sooner, but unless someone already has the patch, this reminds me a little too much of the arguments for including windows support in a single release because we already had a work port/patch set to go from. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Andreas 'ads' Scherbaum wrote: On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote: Well, yes, but you do know about archive_timeout, right? No need to wait 2 hours. Then you ship 16 MB binary stuff every 30 second or every minute but you only have some kbyte real data in the logfile. This must be taken into account, especially if you ship the logfile over the internet (means: no high-speed connection, maybe even pay-per-traffic) to the slave. Sure there's a price to pay. But that doesn't mean the facility doesn't exist. And I rather suspect that most of Josh's customers aren't too concerned about traffic charges or affected by such bandwidth restrictions. Certainly, none of my clients are, and they aren't in the giant class. Shipping a 16Mb file, particularly if compressed, every minute or so, is not such a huge problem for a great many commercial users, and even many domestic users. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Thursday 29 May 2008 20:31:31 Greg Smith wrote: On Thu, 29 May 2008, Tom Lane wrote: There's no point in having read-only slave queries if you don't have a trustworthy method of getting the data to them. This is a key statement that highlights the difference in how you're thinking about this compared to some other people here. As far as some are concerned, the already working log shipping *is* a trustworthy method of getting data to the read-only slaves. There are plenty of applications (web oriented ones in particular) where if you could direct read-only queries against a slave, the resulting combination would be a giant improvement over the status quo even if that slave was as much as archive_timeout behind the master. That quantity of lag is perfectly fine for a lot of the same apps that have read scalability issues. If you're someone who falls into that camp, the idea of putting the sync replication job before the read-only slave one seems really backwards. Just looking at it from an overall market perspective, synchronous log shipping pretty much only addresses failover needs, where as read-only slaves address both failover and scaling issues. (Note I say address, not solve). -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Friday 30 May 2008 01:10:20 Tom Lane wrote: Greg Smith [EMAIL PROTECTED] writes: I fully accept that it may be the case that it doesn't make technical sense to tackle them in any order besides sync-read-only slaves because of dependencies in the implementation between the two. Well, it's certainly not been my intention to suggest that no one should start work on read-only-slaves before we finish the other part. The point is that I expect the log shipping issues will be done first because they're easier, and it would be pointless to not release that feature if we had it. But since you mention it: one of the plausible answers for fixing the vacuum problem for read-only slaves is to have the slaves push an xmin back upstream to the master to prevent premature vacuuming. The current design of pg_standby is utterly incapable of handling that requirement. So there might be an implementation dependency there, depending on how we want to solve that problem. Sure, but whose to say that after synchronous wal shipping is finished it wont need a serious re-write due to new needs from the hot standby feature. I think going either way carries some risk. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Sat, May 31, 2008 at 1:52 AM, Greg Smith [EMAIL PROTECTED] wrote: On Fri, 30 May 2008, Andreas 'ads' Scherbaum wrote: Then you ship 16 MB binary stuff every 30 second or every minute but you only have some kbyte real data in the logfile. Not if you use pg_clearxlogtail ( http://www.2ndquadrant.com/replication.htm ), which got lost in the giant March commitfest queue but should probably wander into contrib as part of 8.4. This means we need to modify pg_standby to not check for filesize when reading XLogs. Best regards, -- [EMAIL PROTECTED] [EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com EnterpriseDB http://www.enterprisedb.com Mail sent from my BlackLaptop device
Re: [HACKERS] Core team statement on replication in PostgreSQL
Andreas 'ads' Scherbaum wrote: On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote: Well, yes, but you do know about archive_timeout, right? No need to wait 2 hours. Then you ship 16 MB binary stuff every 30 second or every minute but you only have some kbyte real data in the logfile. This must be taken into account, especially if you ship the logfile over the internet (means: no high-speed connection, maybe even pay-per-traffic) to the slave. Sure there's a price to pay. But that doesn't mean the facility doesn't exist. And I rather suspect that most of Josh's customers aren't too concerned about traffic charges or affected by such bandwidth restrictions. Certainly, none of my clients are, and they aren't in the giant class. Shipping a 16Mb file, particularly if compressed, every minute or so, is not such a huge problem for a great many commercial users, and even many domestic users. Sumitomo Electric Co., Ltd., a 20 billion dollars selling company in Japan (parent company of Sumitomo Electric Information Systems Co., Ltd., which is one of the Recursive SQL development support company) uses 100 PostgreSQL servers. They are doing backups by using log shipping to another data center and have problems with the amount of the transferring log data. They said this is one of the big problems they have with PostgreSQL and hope it will be solved in the near future. -- Tatsuo Ishii SRA OSS, Inc. Japan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Sat, 2008-05-31 at 02:48 +0530, Gurjeet Singh wrote: Not if you use pg_clearxlogtail ( http://www.2ndquadrant.com/replication.htm ), which got lost in the giant March commitfest queue but should probably wander into contrib as part of 8.4. This means we need to modify pg_standby to not check for filesize when reading XLogs. Best regards, It does. Joshua D. Drake -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Sat, 31 May 2008, Gurjeet Singh wrote: Not if you use pg_clearxlogtail This means we need to modify pg_standby to not check for filesize when reading XLogs. No, the idea is that you run the segments through pg_clearxlogtail | gzip, which then compresses lightly used segments massively because all the unused bytes are 0. File comes out the same size at the other side, but you didn't ship a full 16MB if there was only a few KB used. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Fri, 2008-05-30 at 01:10 -0400, Tom Lane wrote: Greg Smith [EMAIL PROTECTED] writes: I fully accept that it may be the case that it doesn't make technical sense to tackle them in any order besides sync-read-only slaves because of dependencies in the implementation between the two. Well, it's certainly not been my intention to suggest that no one should start work on read-only-slaves before we finish the other part. The point is that I expect the log shipping issues will be done first because they're easier, and it would be pointless to not release that feature if we had it. Agreed. I'm arriving late to a thread that seems to have grown out of all proportion. AFAICS streaming WAL and hot standby are completely orthogonal features. Streaming WAL is easier and if NTT can release their code to open source we may get this in the Sept commit fest. Hot Standby is harder and it was my viewpoint at PGCon that we may not have a perfect working version of this by the end of 8.4. We are very likely to have something working, but maybe not the whole feature set as we might wish to have. I expect to be actively working on this soon. I definitely do want to see WAL streaming going in as early as possible and before end of 8.4, otherwise code conflicts and other difficulties are likely to push out the 8.4 date and/or Hot Standby. So as I see it, Tom has only passed on my comments on this, not added or removed anything. The main part of the announcement was really about bringing the WAL streaming into core and effectively favouring it over a range of other projects. Can we all back off a little on this for now? Various concerns have been validly expressed, but it will all come good AFAICS. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Le vendredi 30 mai 2008, Tom Lane a écrit : No, I think it would be a useless expenditure of energy. Failover includes a lot of things that are not within our purview: switching IP addresses to point to the new server, some kind of STONITH solution to keep the original master from coming back to life, etc. Moreover there are already projects/products concerned with those issues. Well, I forgot that there's in fact no active plan to put pgbouncer features into core at the moment (I'm sure I've read something about this on the lists though). If it was the case, the slave could proxy queries to the master, and stop proxying but serve them if the master tells it it's dying. This way, no need to switch IP addresses, the clients just connect as usual and get results back and do not have to know whether the host they're qerying against is a slave or a master. This level of smartness is into -core. The STONITH part in case of known failure (fatal) does not seem that hard either, as the master at fatal time could write somewhere it's now a slave and use this at next startup time (recovery.conf?). If it can't even do that, well, we're back to crash situation with no provided automatic failover solution. Not handled failure cases obviously will continue to exist. I'm not asking for all cases managed in -core please, just for some level of effort on the topic. Of course, I'm just the one asking questions and trying to raise ideas, so I'm perfectly fine with your current answer (useless expenditure of energy) even if somewhat disagreeing on the useless part of it :) As for the integrated pgbouncer daemon part, I'm thinking this would allow the infamous part 3 of the proposal (read-only slave) to get pretty simple to setup when ready: the slave knows who its master is, and as soon as an XID is needed the transaction queries are forwarded/proxied to it. Thanks again Florian ! It might be useful to document where to find solutions to that problem, but we can't take it on as part of core Postgres. Even the part when it makes sense (provided it does and I'm not completely off tracks here)? Regards, -- dim signature.asc Description: This is a digitally signed message part.
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Fri, 30 May 2008 16:22:41 -0400 (EDT) Greg Smith wrote: On Fri, 30 May 2008, Andreas 'ads' Scherbaum wrote: Then you ship 16 MB binary stuff every 30 second or every minute but you only have some kbyte real data in the logfile. Not if you use pg_clearxlogtail ( http://www.2ndquadrant.com/replication.htm ), which got lost in the giant March commitfest queue but should probably wander into contrib as part of 8.4. Yes, this topic was discussed several times in the past but to solve this it needs a patch/solution which is integrated into PG itself, not contrib. Kind regards -- Andreas 'ads' Scherbaum German PostgreSQL User Group -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Thu, 2008-05-29 at 10:12 -0400, Tom Lane wrote: The Postgres core team met at PGCon to discuss a few issues, the largest of which is the need for simple, built-in replication for PostgreSQL. Historically the project policy has been to avoid putting replication into core PostgreSQL, so as to leave room for development of competing solutions, recognizing that there is no one size fits all replication solution. However, it is becoming clear that this policy is hindering acceptance of PostgreSQL to too great an extent, compared to the benefit it offers to the add-on replication projects. Users who might consider PostgreSQL are choosing other database systems because our existing replication options are too complex to install and use for simple cases. In practice, simple asynchronous single-master-multiple-slave replication covers a respectable fraction of use cases, so we have concluded that we should allow such a feature to be included in the core project. We emphasize that this is not meant to prevent continued development of add-on replication projects that cover more complex use cases. We believe that the most appropriate base technology for this is probably real-time WAL log shipping, as was demoed by NTT OSS at PGCon. We hope that such a feature can be completed for 8.4. Ideally this would be coupled with the ability to execute read-only queries on the slave servers, but we see technical difficulties that might prevent that from being completed before 8.5 or even further out. (The big problem is that long-running slave-side queries might still need tuples that are vacuumable on the master, and so replication of vacuuming actions would cause the slave's queries to deliver wrong answers.) Again, this will not replace Slony, pgPool, Continuent, Londiste, or other systems for many users, as it will be not be highly scalable nor support long-distance replication nor replicating less than an entire installation. But it is time to include a simple, reliable basic replication feature in the core system. I'm in full support of this and commend the work of the NTT team. The goals and timescales are realistic and setting a timetable in this way will help planning for many users, I'm expecting to lead the charge on the Hot Standby project. The problem mentioned is just one of the issues, though overall I'm now optimistic about our eventual success in that area. I'm discussing this now with a couple of sponsors and would welcome serious financial commitments to this goal. Please contact me off-list if you agree also. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Andrew, Sure there's a price to pay. But that doesn't mean the facility doesn't exist. And I rather suspect that most of Josh's customers aren't too concerned about traffic charges or affected by such bandwidth restrictions. Certainly, none of my clients are, and they aren't in the giant class. Shipping a 16Mb file, particularly if compressed, every minute or so, is not such a huge problem for a great many commercial users, and even many domestic users. The issue is that when you're talking about telecommunications companies (and similar) once a minute isn't adequate. Those folks want at least every second, or even better synchronous. Anyway, this is a pretty pointless discussion given that we want both capabilities, and stuff will get implemented in the order it makes technical sense. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Fwd: Re: [HACKERS] Core team statement on replication in PostgreSQL
[Looks like this mail missed the hackers list on reply to all, I wonder how it could happen... so I forward it] On Thu, 2008-05-29 at 17:00 +0100, Dave Page wrote: Yes, we're talking real-time streaming (synchronous) log shipping. Is there any design already how would this be implemented ? Some time ago I was interested in this subject and thought about having a normal postgres connection where the slave would issue a query to a special view which would simply stream WAL records as bytea from a requested point, without ever finishing. This would have the advantage (against a custom socket streaming solution) that it can reuse the complete infrastructure of connection making/managing/security (think SSL for no sniffing) of the postgres server. It would also be a public interface, which could be used by other possible tools too (think PITR management application/WAL stream repository). Another advantage would be that a PITR solution could be serving as a source for the WAL stream too, so the slave could either get the real time stream from the master, or rebuild a PITR state from a WAL repository server, using the same interface... Probably some kind of WAL subscription management should be also implemented, so that the slave can signal the master which WAL records it already applied and can be recycled on the master, and it would be nice if there could be multiple subscribers at the same time. Some subscriber time-out could be also implemented, while marking the subscription as timed out, so that the slave can know that it has to rebuild itself... Cheers, Csaba. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Fri, 30 May 2008 17:05:57 -0400 Andrew Dunstan wrote: Andreas 'ads' Scherbaum wrote: On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote: Well, yes, but you do know about archive_timeout, right? No need to wait 2 hours. Then you ship 16 MB binary stuff every 30 second or every minute but you only have some kbyte real data in the logfile. This must be taken into account, especially if you ship the logfile over the internet (means: no high-speed connection, maybe even pay-per-traffic) to the slave. Sure there's a price to pay. But that doesn't mean the facility doesn't exist. And I rather suspect that most of Josh's customers aren't too concerned about traffic charges or affected by such bandwidth restrictions. Certainly, none of my clients are, and they aren't in the giant class. Shipping a 16Mb file, particularly if compressed, every minute or so, is not such a huge problem for a great many commercial users, and even many domestic users. The real problem is not the 16 MB, the problem is: you can't compress this file. If the logfile is rotated it still contains all the old binary data which is not a good starter for compression. So you may have some kB changes in the wal logfile every minute but you still copy 16 MB data. Sure, it's not so much - but if you rotate a logfile every minute this still transfers 16*60*24 = ~23 GB a day. Kind regards -- Andreas 'ads' Scherbaum German PostgreSQL User Group -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Tatsuo Ishii wrote: Andreas 'ads' Scherbaum wrote: On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote: Well, yes, but you do know about archive_timeout, right? No need to wait 2 hours. Then you ship 16 MB binary stuff every 30 second or every minute but you only have some kbyte real data in the logfile. This must be taken into account, especially if you ship the logfile over the internet (means: no high-speed connection, maybe even pay-per-traffic) to the slave. Sure there's a price to pay. But that doesn't mean the facility doesn't exist. And I rather suspect that most of Josh's customers aren't too concerned about traffic charges or affected by such bandwidth restrictions. Certainly, none of my clients are, and they aren't in the giant class. Shipping a 16Mb file, particularly if compressed, every minute or so, is not such a huge problem for a great many commercial users, and even many domestic users. Sumitomo Electric Co., Ltd., a 20 billion dollars selling company in Japan (parent company of Sumitomo Electric Information Systems Co., Ltd., which is one of the Recursive SQL development support company) uses 100 PostgreSQL servers. They are doing backups by using log shipping to another data center and have problems with the amount of the transferring log data. They said this is one of the big problems they have with PostgreSQL and hope it will be solved in the near future. Excellent data point. Now, what I'd like to know is whether they are getting into trouble simply because of the volume of log data generated or because they have a short archive_timeout set. If it's the former (which seems more likely) then none of the ideas I have seen so far in this discussion seemed likely to help, and that would indeed be a major issue we should look at. Another question is this: are they being overwhelmed by the amount of network traffic generated, or by difficulty in postgres producers and/or consumers to keep up? If it's network traffic, then perhaps compression would help us. Maybe we need to set some goals for the level of log volumes we expect to be able to create/send/comsume. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Sat, May 31, 2008 at 3:41 AM, Greg Smith [EMAIL PROTECTED] wrote: On Sat, 31 May 2008, Gurjeet Singh wrote: Not if you use pg_clearxlogtail This means we need to modify pg_standby to not check for filesize when reading XLogs. No, the idea is that you run the segments through pg_clearxlogtail | gzip, which then compresses lightly used segments massively because all the unused bytes are 0. File comes out the same size at the other side, but you didn't ship a full 16MB if there was only a few KB used. Got it. I remember reading about pg_clearxlogtail in these mailing lists; but somehow forgot how it actually worked! -- [EMAIL PROTECTED] [EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com EnterpriseDB http://www.enterprisedb.com Mail sent from my BlackLaptop device
[HACKERS] Core team statement on replication in PostgreSQL
The Postgres core team met at PGCon to discuss a few issues, the largest of which is the need for simple, built-in replication for PostgreSQL. Historically the project policy has been to avoid putting replication into core PostgreSQL, so as to leave room for development of competing solutions, recognizing that there is no one size fits all replication solution. However, it is becoming clear that this policy is hindering acceptance of PostgreSQL to too great an extent, compared to the benefit it offers to the add-on replication projects. Users who might consider PostgreSQL are choosing other database systems because our existing replication options are too complex to install and use for simple cases. In practice, simple asynchronous single-master-multiple-slave replication covers a respectable fraction of use cases, so we have concluded that we should allow such a feature to be included in the core project. We emphasize that this is not meant to prevent continued development of add-on replication projects that cover more complex use cases. We believe that the most appropriate base technology for this is probably real-time WAL log shipping, as was demoed by NTT OSS at PGCon. We hope that such a feature can be completed for 8.4. Ideally this would be coupled with the ability to execute read-only queries on the slave servers, but we see technical difficulties that might prevent that from being completed before 8.5 or even further out. (The big problem is that long-running slave-side queries might still need tuples that are vacuumable on the master, and so replication of vacuuming actions would cause the slave's queries to deliver wrong answers.) Again, this will not replace Slony, pgPool, Continuent, Londiste, or other systems for many users, as it will be not be highly scalable nor support long-distance replication nor replicating less than an entire installation. But it is time to include a simple, reliable basic replication feature in the core system. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On 5/29/08, Tom Lane [EMAIL PROTECTED] wrote: The Postgres core team met at PGCon to discuss a few issues, the largest of which is the need for simple, built-in replication for PostgreSQL. Historically the project policy has been to avoid putting replication into core PostgreSQL, so as to leave room for development of competing solutions, recognizing that there is no one size fits all replication solution. However, it is becoming clear that this policy is hindering acceptance of PostgreSQL to too great an extent, compared to the benefit it offers to the add-on replication projects. Users who might consider PostgreSQL are choosing other database systems because our existing replication options are too complex to install and use for simple cases. In practice, simple asynchronous single-master-multiple-slave replication covers a respectable fraction of use cases, so we have concluded that we should allow such a feature to be included in the core project. We emphasize that this is not meant to prevent continued development of add-on replication projects that cover more complex use cases. We believe that the most appropriate base technology for this is probably real-time WAL log shipping, as was demoed by NTT OSS at PGCon. We hope that such a feature can be completed for 8.4. +1 Although I would explain it more shortly - we do need a solution for lossless failover servers and such solution needs to live in core backend. Ideally this would be coupled with the ability to execute read-only queries on the slave servers, but we see technical difficulties that might prevent that from being completed before 8.5 or even further out. (The big problem is that long-running slave-side queries might still need tuples that are vacuumable on the master, and so replication of vacuuming actions would cause the slave's queries to deliver wrong answers.) Well, both Slony-I and upcoming Skytools 3 have the same problem when cleaning events and have it solved simply by slaves reporting back their lowest position on event stream. I cannot see why it cannot be applied in this case too. So each slave just needs to report its own longest open tx as open to master. Yes, it bloats master but no way around it. Only problem could be the plan to vacuum tuples updated in between long running tx and the regular ones, but such behaviour can be just turned off. We could also have a option of inaccessible slave, for those who fear bloat on master. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Thu, May 29, 2008 at 10:12:55AM -0400, Tom Lane wrote: The Postgres core team met at PGCon to discuss a few issues, the largest of which is the need for simple, built-in replication for PostgreSQL. Historically the project policy has been to avoid putting replication into core PostgreSQL, so as to leave room for development of competing solutions, recognizing that there is no one size fits all replication solution. However, it is becoming clear that this policy is hindering acceptance of PostgreSQL to too great an extent, compared to the benefit it offers to the add-on replication projects. Users who might consider PostgreSQL are choosing other database systems because our existing replication options are too complex to install and use for simple cases. In practice, simple asynchronous single-master-multiple-slave replication covers a respectable fraction of use cases, so we have concluded that we should allow such a feature to be included in the core project. We emphasize that this is not meant to prevent continued development of add-on replication projects that cover more complex use cases. We believe that the most appropriate base technology for this is probably real-time WAL log shipping, as was demoed by NTT OSS at PGCon. We hope that such a feature can be completed for 8.4. Ideally this would be coupled with the ability to execute read-only queries on the slave servers, but we see technical difficulties that might prevent that from being completed before 8.5 or even further out. (The big problem is that long-running slave-side queries might still need tuples that are vacuumable on the master, and so replication of vacuuming actions would cause the slave's queries to deliver wrong answers.) This part is a deal-killer. It's a giant up-hill slog to sell warm standby to those in charge of making resources available because the warm standby machine consumes SA time, bandwidth, power, rack space, etc., but provides no tangible benefit, and this feature would have exactly the same problem. IMHO, without the ability to do read-only queries on slaves, it's not worth doing this feature at all. Cheers, David. -- David Fetter [EMAIL PROTECTED] http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: [EMAIL PROTECTED] Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Thu, 2008-05-29 at 08:21 -0700, David Fetter wrote: On Thu, May 29, 2008 at 10:12:55AM -0400, Tom Lane wrote: This part is a deal-killer. It's a giant up-hill slog to sell warm standby to those in charge of making resources available because the warm standby machine consumes SA time, bandwidth, power, rack space, etc., but provides no tangible benefit, and this feature would have exactly the same problem. IMHO, without the ability to do read-only queries on slaves, it's not worth doing this feature at all. The only question I have is... what does this give us that PITR doesn't give us? Sincerely, Joshua D. Drake -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Marko, But Tom's mail gave me impression core wants to wait until we get perfect read-only slave implementation so we wait with it until 8.6, which does not seem sensible. If we can do slightly inefficient (but simple) implementation right now, I see no reason to reject it, we can always improve it later. That's incorrect. We're looking for a workable solution. If we could get one for 8.4, that would be brilliant but we think it's going to be harder than that. Publishing the XIDs back to the master is one possibility. We also looked at using spillover segments for vacuumed rows, but that seemed even less viable. I'm also thinking, for *async replication*, that we could simply halt replication on the slave whenever a transaction passes minxid on the master. However, the main focus will be on synchrounous hot standby. --Josh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Thu, May 29, 2008 at 08:46:22AM -0700, Joshua D. Drake wrote: On Thu, 2008-05-29 at 08:21 -0700, David Fetter wrote: This part is a deal-killer. It's a giant up-hill slog to sell warm standby to those in charge of making resources available because the warm standby machine consumes SA time, bandwidth, power, rack space, etc., but provides no tangible benefit, and this feature would have exactly the same problem. IMHO, without the ability to do read-only queries on slaves, it's not worth doing this feature at all. The only question I have is... what does this give us that PITR doesn't give us? It looks like a wrapper for PITR to me, so the gain would be ease of use. Cheers, David. -- David Fetter [EMAIL PROTECTED] http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: [EMAIL PROTECTED] Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Thu, May 29, 2008 at 11:46 AM, Joshua D. Drake [EMAIL PROTECTED] wrote: The only question I have is... what does this give us that PITR doesn't give us? I think the idea is that WAL records would be shipped (possibly via socket) and applied as they're generated, rather than on a file-by-file basis. At least that's what real-time implies to me... -Doug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
Josh Berkus wrote: Marko, But Tom's mail gave me impression core wants to wait until we get perfect read-only slave implementation so we wait with it until 8.6, which does not seem sensible. If we can do slightly inefficient (but simple) implementation right now, I see no reason to reject it, we can always improve it later. That's incorrect. We're looking for a workable solution. If we could get one for 8.4, that would be brilliant but we think it's going to be harder than that. Publishing the XIDs back to the master is one possibility. We also looked at using spillover segments for vacuumed rows, but that seemed even less viable. I'm also thinking, for *async replication*, that we could simply halt replication on the slave whenever a transaction passes minxid on the master. However, the main focus will be on synchrounous hot standby. Another idea I discussed with Tom is having the slave _delay_ applying WAL files until all slave snapshots are ready. -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Thu, May 29, 2008 at 4:48 PM, Douglas McNaught [EMAIL PROTECTED] wrote: On Thu, May 29, 2008 at 11:46 AM, Joshua D. Drake [EMAIL PROTECTED] wrote: The only question I have is... what does this give us that PITR doesn't give us? I think the idea is that WAL records would be shipped (possibly via socket) and applied as they're generated, rather than on a file-by-file basis. At least that's what real-time implies to me... Yes, we're talking real-time streaming (synchronous) log shipping. -- Dave Page EnterpriseDB UK: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On Thursday 29 May 2008 09:54:03 am Marko Kreen wrote: On 5/29/08, Tom Lane [EMAIL PROTECTED] wrote: The Postgres core team met at PGCon to discuss a few issues, the largest of which is the need for simple, built-in replication for PostgreSQL. Historically the project policy has been to avoid putting replication into core PostgreSQL, so as to leave room for development of competing solutions, recognizing that there is no one size fits all replication solution. However, it is becoming clear that this policy is hindering acceptance of PostgreSQL to too great an extent, compared to the benefit it offers to the add-on replication projects. Users who might consider PostgreSQL are choosing other database systems because our existing replication options are too complex to install and use for simple cases. In practice, simple asynchronous single-master-multiple-slave replication covers a respectable fraction of use cases, so we have concluded that we should allow such a feature to be included in the core project. We emphasize that this is not meant to prevent continued development of add-on replication projects that cover more complex use cases. We believe that the most appropriate base technology for this is probably real-time WAL log shipping, as was demoed by NTT OSS at PGCon. We hope that such a feature can be completed for 8.4. +1 Although I would explain it more shortly - we do need a solution for lossless failover servers and such solution needs to live in core backend. +1 for lossless failover (ie, synchronous) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
On 5/29/08, David Fetter [EMAIL PROTECTED] wrote: On Thu, May 29, 2008 at 10:12:55AM -0400, Tom Lane wrote: Ideally this would be coupled with the ability to execute read-only queries on the slave servers, but we see technical difficulties that might prevent that from being completed before 8.5 or even further out. (The big problem is that long-running slave-side queries might still need tuples that are vacuumable on the master, and so replication of vacuuming actions would cause the slave's queries to deliver wrong answers.) This part is a deal-killer. It's a giant up-hill slog to sell warm standby to those in charge of making resources available because the warm standby machine consumes SA time, bandwidth, power, rack space, etc., but provides no tangible benefit, and this feature would have exactly the same problem. IMHO, without the ability to do read-only queries on slaves, it's not worth doing this feature at all. I would not be so harsh - I'd like to have the lossless standby even without read-only slaves. But Tom's mail gave me impression core wants to wait until we get perfect read-only slave implementation so we wait with it until 8.6, which does not seem sensible. If we can do slightly inefficient (but simple) implementation right now, I see no reason to reject it, we can always improve it later. Especially as it can be switchable. And we could also have transaction_timeout paramenter on slaves so the hit on master is limited. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Core team statement on replication in PostgreSQL
* Josh Berkus [EMAIL PROTECTED] [080529 11:52]: Marko, But Tom's mail gave me impression core wants to wait until we get perfect read-only slave implementation so we wait with it until 8.6, which does not seem sensible. If we can do slightly inefficient (but simple) implementation right now, I see no reason to reject it, we can always improve it later. That's incorrect. We're looking for a workable solution. If we could get one for 8.4, that would be brilliant but we think it's going to be harder than that. Publishing the XIDs back to the master is one possibility. We also looked at using spillover segments for vacuumed rows, but that seemed even less viable. I'm also thinking, for *async replication*, that we could simply halt replication on the slave whenever a transaction passes minxid on the master. However, the main focus will be on synchrounous hot standby. Or, instead of statement timeout killing statements on the RO slave, simply kill any old transactions on the RO slave. Old in the sense that the master's xmin has passed it. And it's just an exersise in controlling the age of xmin on the master, which could even be done user-side. Doesn't fit all, but no one size does... It would work for where you're hammering your slaves with a diverse set of high-velocity short queries that you're trying to avoid on the master... An option to pause reply (making it async) or abort transactions (for sync) might make it possible to easily run an async slave for slow reporting queries, and a sync slave for short queries. a. -- Aidan Van Dyk Create like a god, [EMAIL PROTECTED] command like a king, http://www.highrise.ca/ work like a slave. signature.asc Description: Digital signature
Re: [HACKERS] Core team statement on replication in PostgreSQL
Joshua D. Drake wrote: On Thu, 2008-05-29 at 08:21 -0700, David Fetter wrote: On Thu, May 29, 2008 at 10:12:55AM -0400, Tom Lane wrote: This part is a deal-killer. It's a giant up-hill slog to sell warm standby to those in charge of making resources available because the warm standby machine consumes SA time, bandwidth, power, rack space, etc., but provides no tangible benefit, and this feature would have exactly the same problem. IMHO, without the ability to do read-only queries on slaves, it's not worth doing this feature at all. The only question I have is... what does this give us that PITR doesn't give us? Since people seem to be unclear on what we're proposing: 8.4 Synchronous Warm Standby: makes PostgreSQL more suitable for HA systems by eliminating failover data loss and cutting failover time. 8.5 (probably) Synchronous Asynchronous Hot Standby: adds read-only queries on slaves to the above. Again, if we can implement queries on slaves for 8.4, we're all for it. However, after conversations in Core and with Simon we all think it's going to be too big a task to complete in 4-5 months. We *don't* want to end up delaying 8.4 for 5 months because we're debugging hot standby. --Josh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers