Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-07-15 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Agreed.  I realize why we are not zeroing those bytes (for performance),
  but can't we have the archiver zero those bytes before calling the
  'archive_command'?
 
 The archiver doesn't know any more about where the end-of-data is than
 the archive_command does.  Moreover, the archiver doesn't know whether
 the archive_command cares.  I think the separate module is a fine
 solution.
 
 It should also be pointed out that the whole thing becomes uninteresting
 if we get real-time log shipping implemented.  So I see absolutely no
 point in spending time integrating pg_clearxlogtail now.

People doing PITR are still going to be saving these files, and for a
long time, so I think this is still something we should try to address.

Added to TODO:

o Reduce PITR WAL file size by removing full page writes and
  by removing trailing bytes to improve compression

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-07-15 Thread Alvaro Herrera
Bruce Momjian wrote:

 Added to TODO:
 
 o Reduce PITR WAL file size by removing full page writes and
   by removing trailing bytes to improve compression

If we remove full page writes, how does hint bit setting get propagated
to the slave?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-07-15 Thread Bruce Momjian
Alvaro Herrera wrote:
 Bruce Momjian wrote:
 
  Added to TODO:
  
  o Reduce PITR WAL file size by removing full page writes and
by removing trailing bytes to improve compression
 
 If we remove full page writes, how does hint bit setting get propagated
 to the slave?

We would remove full page writes that are needed for crash recovery, but
perhaps keep other full pages.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-07-15 Thread Alvaro Herrera
Bruce Momjian wrote:
 Alvaro Herrera wrote:
  Bruce Momjian wrote:
  
   Added to TODO:
   
   o Reduce PITR WAL file size by removing full page writes and
 by removing trailing bytes to improve compression
  
  If we remove full page writes, how does hint bit setting get propagated
  to the slave?
 
 We would remove full page writes that are needed for crash recovery, but
 perhaps keep other full pages.

How do you tell which is which?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-07-15 Thread Bruce Momjian
Alvaro Herrera wrote:
 Bruce Momjian wrote:
  Alvaro Herrera wrote:
   Bruce Momjian wrote:
   
Added to TODO:

o Reduce PITR WAL file size by removing full page writes and
  by removing trailing bytes to improve compression
   
   If we remove full page writes, how does hint bit setting get propagated
   to the slave?
  
  We would remove full page writes that are needed for crash recovery, but
  perhaps keep other full pages.
 
 How do you tell which is which?

The WAL format would have to be modified to indicate which entries can
be discarded.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-16 Thread Kevin Grittner
 On Mon, Jun 9, 2008 at  9:48 PM, in message
[EMAIL PROTECTED], Greg Smith
[EMAIL PROTECTED] wrote: 
 On Mon, 9 Jun 2008, Tom Lane wrote:
 
 It should also be pointed out that the whole thing becomes
uninteresting
 if we get real-time log shipping implemented.  So I see absolutely
no
 point in spending time integrating pg_clearxlogtail now.
 
 There are remote replication scenarios over a WAN (mainly aimed at 
 disaster recovery) that want to keep a fairly updated database
without 
 putting too much traffic over the link.  People in that category
really 
 want zeroed tail+compressed archives, but probably not the extra
overhead 
 that comes with shipping smaller packets in a real-time
implementation.
 
We ship the WAL files over a (relatively) slow WAN for disaster
recovery purposes, and we would be fine with replacing our current
techniques with real-time log shipping as long as:
 
(1)  We can do it asynchronously.  (i.e., we don't have to wait for
WAN latency to commit transactions.)
 
(2)  It can ship to multiple targets.  (Management dictates that we
have backups at the site of origin as well as our central site.  A
failure to replicate to one must not delay the other.)
 
(3)  It doesn't consume substantially more WAN bandwidth overall.
 
A solution which fails to cover any of these leaves pg_clearxlogtail
interesting to us.
  
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-10 Thread Gregory Stark
Greg Smith [EMAIL PROTECTED] writes:

 On Mon, 9 Jun 2008, Tom Lane wrote:

 It should also be pointed out that the whole thing becomes uninteresting
 if we get real-time log shipping implemented.  So I see absolutely no
 point in spending time integrating pg_clearxlogtail now.

 There are remote replication scenarios over a WAN (mainly aimed at disaster
 recovery) that want to keep a fairly updated database without putting too much
 traffic over the link.  People in that category really want zeroed
 tail+compressed archives, but probably not the extra overhead that comes with
 shipping smaller packets in a real-time implementation.

Instead of zeroing bytes and depending on compression why not just pass an
extra parameter to the archive command with the offset to the logical end of
data. The archive_command could just copy from the start to that point and not
bother transferring the rest.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-10 Thread Heikki Linnakangas

Gregory Stark wrote:

Instead of zeroing bytes and depending on compression why not just pass an
extra parameter to the archive command with the offset to the logical end of
data.


Because the archiver process doesn't have that information.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-10 Thread Josh Berkus
All,

  For the slave to not interfere with the master at all, we would need to
  delay application of WAL files on each slave until visibility on that
  slave allows the WAL to be applied, but in that case we would have
  long-running transactions delay data visibility of all slave sessions.

 Right, but you could segregate out long-running queries to one slave
 server that could be further behind than the others.

I still see having 2 different settings:

Synchronous: XID visibility is pushed to the master.  Maintains synchronous 
failover, and users are expected to run *1* master to *1* slave for most 
installations.

Asynchronous: replication stops on the slave whenever minxid gets out of 
synch.  Could have multiple slaves, but noticeable lag between master and 
slave.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-10 Thread ITAGAKI Takahiro

Josh Berkus [EMAIL PROTECTED] wrote:

 I still see having 2 different settings:
 
 Synchronous: XID visibility is pushed to the master.  Maintains synchronous 
 failover, and users are expected to run *1* master to *1* slave for most 
 installations.
 
 Asynchronous: replication stops on the slave whenever minxid gets out of 
 synch.  Could have multiple slaves, but noticeable lag between master and 
 slave.

I agree with you that we have sync/async option in log-shipping.
Also, we could have another setting - synchronous-shipping and
asynchronous-flushing. We won't lose transactions if both servers are
down at once and can avoid delays to flush wal files into primary's disks.

As for multiple slaves, we could have a cascading configuration;
WAL receiver also delivers WAL records to other servers.
I think it is simple that the postgres core has only one-to-one replication
and multiple slaves are supported by 3rd party's WAL receivers.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-09 Thread Bruce Momjian
Gurjeet Singh wrote:
 On Fri, May 30, 2008 at 10:40 AM, Tom Lane [EMAIL PROTECTED] wrote:
 
  But since you mention it: one of the plausible answers for fixing the
  vacuum problem for read-only slaves is to have the slaves push an xmin
  back upstream to the master to prevent premature vacuuming.  The current
  design of pg_standby is utterly incapable of handling that requirement.
  So there might be an implementation dependency there, depending on how
  we want to solve that problem.
 
 
 I think it would be best to not make the slave interfere with the master's
 operations; that's only going to increase the operational complexity of such
 a solution.
 
 There could be multiple slaves following a master, some serving

For the slave to not interfere with the master at all, we would need to
delay application of WAL files on each slave until visibility on that
slave allows the WAL to be applied, but in that case we would have
long-running transactions delay data visibility of all slave sessions.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-09 Thread Bruce Momjian
Andreas 'ads' Scherbaum wrote:
 On Fri, 30 May 2008 16:22:41 -0400 (EDT) Greg Smith wrote:
 
  On Fri, 30 May 2008, Andreas 'ads' Scherbaum wrote:
  
   Then you ship 16 MB binary stuff every 30 second or every minute but
   you only have some kbyte real data in the logfile.
  
  Not if you use pg_clearxlogtail ( 
  http://www.2ndquadrant.com/replication.htm ), which got lost in the giant 
  March commitfest queue but should probably wander into contrib as part of 
  8.4.
 
 Yes, this topic was discussed several times in the past but to
 solve this it needs a patch/solution which is integrated into PG
 itself, not contrib.

Agreed.  I realize why we are not zeroing those bytes (for performance),
but can't we have the archiver zero those bytes before calling the
'archive_command'?

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-09 Thread Alvaro Herrera
Bruce Momjian wrote:

 Agreed.  I realize why we are not zeroing those bytes (for performance),
 but can't we have the archiver zero those bytes before calling the
 'archive_command'?

Perhaps make the zeroing user-settable.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-09 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Gurjeet Singh wrote:
 There could be multiple slaves following a master, some serving

 For the slave to not interfere with the master at all, we would need to
 delay application of WAL files on each slave until visibility on that
 slave allows the WAL to be applied, but in that case we would have
 long-running transactions delay data visibility of all slave sessions.

Right, but you could segregate out long-running queries to one slave
server that could be further behind than the others.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-09 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Agreed.  I realize why we are not zeroing those bytes (for performance),
 but can't we have the archiver zero those bytes before calling the
 'archive_command'?

The archiver doesn't know any more about where the end-of-data is than
the archive_command does.  Moreover, the archiver doesn't know whether
the archive_command cares.  I think the separate module is a fine
solution.

It should also be pointed out that the whole thing becomes uninteresting
if we get real-time log shipping implemented.  So I see absolutely no
point in spending time integrating pg_clearxlogtail now.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-09 Thread Greg Smith

On Mon, 9 Jun 2008, Tom Lane wrote:


It should also be pointed out that the whole thing becomes uninteresting
if we get real-time log shipping implemented.  So I see absolutely no
point in spending time integrating pg_clearxlogtail now.


There are remote replication scenarios over a WAN (mainly aimed at 
disaster recovery) that want to keep a fairly updated database without 
putting too much traffic over the link.  People in that category really 
want zeroed tail+compressed archives, but probably not the extra overhead 
that comes with shipping smaller packets in a real-time implementation.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-09 Thread Koichi Suzuki
Just for information.

In terms of archive compression, I have archive log compression which
will be found in http://pgfoundry.org/projects/pglesslog/

This feature is also included in NTT's synchronized log shipping
replication presented in the last PGCon.

2008/6/10 Greg Smith [EMAIL PROTECTED]:
 On Mon, 9 Jun 2008, Tom Lane wrote:

 It should also be pointed out that the whole thing becomes uninteresting
 if we get real-time log shipping implemented.  So I see absolutely no
 point in spending time integrating pg_clearxlogtail now.

 There are remote replication scenarios over a WAN (mainly aimed at disaster
 recovery) that want to keep a fairly updated database without putting too
 much traffic over the link.  People in that category really want zeroed
 tail+compressed archives, but probably not the extra overhead that comes
 with shipping smaller packets in a real-time implementation.

 --
 * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
--
Koichi Suzuki

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-05 Thread Gregory Stark
Jeff Davis [EMAIL PROTECTED] writes:

 On Wed, 2008-06-04 at 14:17 +0300, Heikki Linnakangas wrote:
  Would that also cover possible differences in page size, 32bit OS vs.
  64bit OS, different timestamp flavour, etc. issues ? AFAIR, all these
  things can have an influence on how the data is written and possibly
  make the WAL incompatible with other postgres instances, even if the
  exact same version...
 
 These are already covered by the information in pg_control.

 Another thing that can change between systems is the collation behavior,
 which can corrupt indexes (and other bad things).

Well, yes and no. It's entirely possible, for example, for a minor release of
an OS to tweak the collation rules for a collation without changing the name.
For the sake of argument they might just be fixing a bug in the collation
rules. From the point of view of the OS that's a minor bug fix that they might
not foresee causing data corruption problems.

Pegging pg_control to a particular release of the OS would be pretty terrible
though. I don't really see an out for this. But it's another roadblock to
consider akin to not-really-immutable index expressions for any proposal
which depends on re-finding index pointers :(

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-04 Thread Markus Schiltknecht

Hello Andrew,

Andrew Sullivan wrote:

Yes.  And silent as ever. :-)


Are the slides of your PgCon talk available for download somewhere?

BTW: up until recently, there was yet another mailing list: 
[EMAIL PROTECTED] It was less focused on hooks 
and got at least some traffic. :-) Are those mails still archived somewhere?


Regards

Markus




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-04 Thread Heikki Linnakangas

Stephen Denne wrote:

Hannu Krosing wrote:

The simplest form of synchronous wal shipping would not even need
postgresql running on slave, just a small daemon which 
reports when wal
blocks are a) received and b) synced to disk. 


While that does sound simple, I'd presume that most people would want the 
guarantee of the same version of postgresql installed wherever the logs are 
ending up, with the log receiver speaking the same protocol version as the log 
sender. I imagine that would be most easily achieved through using something 
like the continuously restoring startup mode of current postgresql.


Hmm, WAL version compatibility is an interesting question. Most minor 
releases hasn't changed the WAL format, and it would be nice to allow 
running different minor versions in the master and slave in those cases. 
But it's certainly not unheard of to change the WAL format. Perhaps we 
should introduce a WAL version number, similar to catalog version?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-04 Thread Teodor Sigaev


Hmm, WAL version compatibility is an interesting question. Most minor 
releases hasn't changed the WAL format, and it would be nice to allow 
As I remember, high minor version should read all WALs from lowers, but it isn't 
true for opposite case and between different major versions.


running different minor versions in the master and slave in those cases. 
But it's certainly not unheard of to change the WAL format. Perhaps we 
should introduce a WAL version number, similar to catalog version?


Agree. Right now it only touches warm-stand-by servers, but introducing simple 
log-shipping and based on it replication will cause a lot of unobvious 
errors/bugs. Is it possible to use catalog version number as WAL version?


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-04 Thread Csaba Nagy
On Wed, 2008-06-04 at 11:13 +0300, Heikki Linnakangas wrote:
 Hmm, WAL version compatibility is an interesting question. Most minor 
 releases hasn't changed the WAL format, and it would be nice to allow 
 running different minor versions in the master and slave in those cases. 
 But it's certainly not unheard of to change the WAL format. Perhaps we 
 should introduce a WAL version number, similar to catalog version?

Would that also cover possible differences in page size, 32bit OS vs.
64bit OS, different timestamp flavour, etc. issues ? AFAIR, all these
things can have an influence on how the data is written and possibly
make the WAL incompatible with other postgres instances, even if the
exact same version...

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-04 Thread Heikki Linnakangas

Csaba Nagy wrote:

On Wed, 2008-06-04 at 11:13 +0300, Heikki Linnakangas wrote:
Hmm, WAL version compatibility is an interesting question. Most minor 
releases hasn't changed the WAL format, and it would be nice to allow 
running different minor versions in the master and slave in those cases. 
But it's certainly not unheard of to change the WAL format. Perhaps we 
should introduce a WAL version number, similar to catalog version?


Would that also cover possible differences in page size, 32bit OS vs.
64bit OS, different timestamp flavour, etc. issues ? AFAIR, all these
things can have an influence on how the data is written and possibly
make the WAL incompatible with other postgres instances, even if the
exact same version...


These are already covered by the information in pg_control.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-04 Thread Heikki Linnakangas

Teodor Sigaev wrote:

Is it possible to use catalog version number as WAL version?


No, because we don't change the catalog version number in minor 
releases, even though we might change WAL format.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-04 Thread Andrew Sullivan
On Wed, Jun 04, 2008 at 09:24:20AM +0200, Markus Schiltknecht wrote:

 Are the slides of your PgCon talk available for download somewhere?

There weren't any slides, really (there were 4 that I put up in case
the cases I was discussing needed back-references, but they didn't).
Joshua tells me that I'm supposed to make the paper readable and put
it up on Command Prompt's website, so I will soon.

 BTW: up until recently, there was yet another mailing list: 
 [EMAIL PROTECTED] It was less focused on hooks 
 and got at least some traffic. :-) Are those mails still archived 
 somewhere?

Unless whoever was operating that list moved it to pgfoundry, I doubt
it (except on backups somewhere).

A

-- 
Andrew Sullivan
[EMAIL PROTECTED]
+1 503 667 4564 x104
http://www.commandprompt.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-04 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 Hmm, WAL version compatibility is an interesting question. Most minor 
 releases hasn't changed the WAL format, and it would be nice to allow 
 running different minor versions in the master and slave in those cases. 
 But it's certainly not unheard of to change the WAL format. Perhaps we 
 should introduce a WAL version number, similar to catalog version?

Yeah, perhaps.  In the past we've changed the WAL page ID field for
this; I'm not sure if that's enough or not.  It does seem like a good
idea to have a way to check that the slaves aren't trying to read a
WAL version they don't understand.  Also, it's possible that the WAL
format doesn't change across a major update, but you still couldn't
work with say an 8.4 master and an 8.3 slave, so maybe we need the
catalog version ID in there too.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-04 Thread Hannu Krosing
On Wed, 2008-06-04 at 10:40 -0400, Tom Lane wrote:
 Heikki Linnakangas [EMAIL PROTECTED] writes:
  Hmm, WAL version compatibility is an interesting question. Most minor 
  releases hasn't changed the WAL format, and it would be nice to allow 
  running different minor versions in the master and slave in those cases. 
  But it's certainly not unheard of to change the WAL format. Perhaps we 
  should introduce a WAL version number, similar to catalog version?
 
 Yeah, perhaps.  In the past we've changed the WAL page ID field for
 this; I'm not sure if that's enough or not.  It does seem like a good
 idea to have a way to check that the slaves aren't trying to read a
 WAL version they don't understand.  Also, it's possible that the WAL
 format doesn't change across a major update, but you still couldn't
 work with say an 8.4 master and an 8.3 slave, so maybe we need the
 catalog version ID in there too.

And something dependent on datetime being integer.

We probably won't need to encode presence of user defined types, like
PostGis , being present ?

-
Hannu



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-04 Thread Tom Lane
Hannu Krosing [EMAIL PROTECTED] writes:
 On Wed, 2008-06-04 at 10:40 -0400, Tom Lane wrote:
 Heikki Linnakangas [EMAIL PROTECTED] writes:
 Hmm, WAL version compatibility is an interesting question. Most minor 
 releases hasn't changed the WAL format, and it would be nice to allow 
 running different minor versions in the master and slave in those cases. 
 But it's certainly not unheard of to change the WAL format. Perhaps we 
 should introduce a WAL version number, similar to catalog version?
 
 Yeah, perhaps.  In the past we've changed the WAL page ID field for
 this; I'm not sure if that's enough or not.  It does seem like a good
 idea to have a way to check that the slaves aren't trying to read a
 WAL version they don't understand.  Also, it's possible that the WAL
 format doesn't change across a major update, but you still couldn't
 work with say an 8.4 master and an 8.3 slave, so maybe we need the
 catalog version ID in there too.

 And something dependent on datetime being integer.

This thread is getting out of hand, actually.

Heikki's earlier comment about pg_control reminded me that we already
have a unique system identifier stored in pg_control and check that
against WAL headers.  So I think we already have enough certainty that
the master and slaves have the same pg_control and hence are the same
for everything checked by pg_control.

However, since by definition pg_control doesn't change in a minor
upgrade, there isn't any easy way to enforce a rule like slaves must be
same or newer minor version as the master.  I'm not sure that we
actually *want* to enforce such a rule, though.  Most of the time, the
other way around would work fine.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-04 Thread Simon Riggs

On Wed, 2008-06-04 at 11:37 -0400, Tom Lane wrote:

 This thread is getting out of hand, actually.

Agreed. We should start new threads for specific things. Please.

 However, since by definition pg_control doesn't change in a minor
 upgrade, there isn't any easy way to enforce a rule like slaves must be
 same or newer minor version as the master.  I'm not sure that we
 actually *want* to enforce such a rule, though. 

Definitely don't want to prevent minor version mismatches. We want to be
able to upgrade a standby, have it catch up with the master then
switchover to the new version. Otherwise we'd have to take whole
replicated system down to do minor upgrades/backouts. Ugh!

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-04 Thread Jeff Davis
On Wed, 2008-06-04 at 14:17 +0300, Heikki Linnakangas wrote:
  Would that also cover possible differences in page size, 32bit OS vs.
  64bit OS, different timestamp flavour, etc. issues ? AFAIR, all these
  things can have an influence on how the data is written and possibly
  make the WAL incompatible with other postgres instances, even if the
  exact same version...
 
 These are already covered by the information in pg_control.

Another thing that can change between systems is the collation behavior,
which can corrupt indexes (and other bad things).

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-04 Thread Tom Lane
Jeff Davis [EMAIL PROTECTED] writes:
 On Wed, 2008-06-04 at 14:17 +0300, Heikki Linnakangas wrote:
 These are already covered by the information in pg_control.

 Another thing that can change between systems is the collation behavior,
 which can corrupt indexes (and other bad things).

That is covered by pg_control, at least to the extent of forcing the
same value of LC_COLLATE.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-04 Thread Jeff Davis
On Wed, 2008-06-04 at 14:23 -0400, Tom Lane wrote:
 That is covered by pg_control, at least to the extent of forcing the
 same value of LC_COLLATE.

But the same LC_COLLATE means different things on different systems.
Even en_US means something different on Mac versus Linux.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-04 Thread Koichi Suzuki
Well, WAL format doesn't only depend on WAL itself, but also depend on
each resource manager.   If we introduce WAL format version
identification, ISTM that we have to take care of the matching of
resource manager in the master and the slave as well.

2008/6/4 Heikki Linnakangas [EMAIL PROTECTED]:
 Stephen Denne wrote:

 Hannu Krosing wrote:

 The simplest form of synchronous wal shipping would not even need
 postgresql running on slave, just a small daemon which reports when wal
 blocks are a) received and b) synced to disk.

 While that does sound simple, I'd presume that most people would want the
 guarantee of the same version of postgresql installed wherever the logs are
 ending up, with the log receiver speaking the same protocol version as the
 log sender. I imagine that would be most easily achieved through using
 something like the continuously restoring startup mode of current
 postgresql.

 Hmm, WAL version compatibility is an interesting question. Most minor
 releases hasn't changed the WAL format, and it would be nice to allow
 running different minor versions in the master and slave in those cases. But
 it's certainly not unheard of to change the WAL format. Perhaps we should
 introduce a WAL version number, similar to catalog version?

 --
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
--
Koichi Suzuki

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-04 Thread Tom Lane
Koichi Suzuki [EMAIL PROTECTED] writes:
 Well, WAL format doesn't only depend on WAL itself, but also depend on
 each resource manager.   If we introduce WAL format version
 identification, ISTM that we have to take care of the matching of
 resource manager in the master and the slave as well.

That seems a bit overdesigned.  What are the prospects that two builds
of the same Postgres version are going to have different sets of
resource managers in them?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-04 Thread Koichi Suzuki
If the version of the master and the slave is different and we'd still
like to allow log shipping replication, we need a negotiation if WAL
format for the two is compatible.  I hope it is not in our scope
and I'm worrying too much.

2008/6/5 Tom Lane [EMAIL PROTECTED]:
 Koichi Suzuki [EMAIL PROTECTED] writes:
 Well, WAL format doesn't only depend on WAL itself, but also depend on
 each resource manager.   If we introduce WAL format version
 identification, ISTM that we have to take care of the matching of
 resource manager in the master and the slave as well.

 That seems a bit overdesigned.  What are the prospects that two builds
 of the same Postgres version are going to have different sets of
 resource managers in them?

regards, tom lane




-- 
--
Koichi Suzuki

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-03 Thread Hannu Krosing
On Mon, 2008-06-02 at 22:40 +0200, Andreas 'ads' Scherbaum wrote:
 On Mon, 02 Jun 2008 11:52:05 -0400 Chris Browne wrote:
 
  [EMAIL PROTECTED] (Andreas 'ads' Scherbaum) writes:
   On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote:
  
   Well, yes, but you do know about archive_timeout, right? No need to wait 
   2 hours.
  
   Then you ship 16 MB binary stuff every 30 second or every minute but
   you only have some kbyte real data in the logfile. This must be taken
   into account, especially if you ship the logfile over the internet
   (means: no high-speed connection, maybe even pay-per-traffic) to the
   slave.
  
  If you have that kind of scenario, then you have painted yourself into
  a corner, and there isn't anything that can be done to extract you
  from it.
 
 You are misunderstanding something. It's perfectly possible that you
 have a low-traffic database with changes every now and then. But you
 have to copy a full 16 MB logfile every 30 seconds or every minute just
 to have the slave up-to-date.

To repeat my other post in this thread:

Actually we can already do better than file-by-file by using
pg_xlogfile_name_offset() which was added sometime in 2006. walmgr.py
from SkyTools package for example does this to get no more than a few
seconds failure window and it copies just the changed part of WAL to
slave.

pg_xlogfile_name_offset() was added just for this purpose - to enable
WAL shipping scripts to query, where inside the logfile current write
pointer is.

It is not synchronous, but it can be made very close, within subsecond
if you poll it frequently enough.

---
Hannu



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-03 Thread Andrew Sullivan
On Sun, Jun 01, 2008 at 01:43:22PM -0400, Tom Lane wrote:
 power to him.  (Is the replica-hooks-discuss list still working?)  But

Yes.  And silent as ever. :-)

A

-- 
Andrew Sullivan
[EMAIL PROTECTED]
+1 503 667 4564 x104
http://www.commandprompt.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-03 Thread Stephen Denne
Hannu Krosing wrote:
 The simplest form of synchronous wal shipping would not even need
 postgresql running on slave, just a small daemon which 
 reports when wal
 blocks are a) received and b) synced to disk. 

While that does sound simple, I'd presume that most people would want the 
guarantee of the same version of postgresql installed wherever the logs are 
ending up, with the log receiver speaking the same protocol version as the log 
sender. I imagine that would be most easily achieved through using something 
like the continuously restoring startup mode of current postgresql.

However variations on this kind of daemon can be used to perform testing, 
configuring it to work well, go slow, pause, not respond, disconnect, or fail 
in particular ways, emulating disk full, etc.

Regards,
Stephen Denne.
--
At the Datamail Group we value teamwork, respect, achievement, client focus, 
and courage. 
This email with any attachments is confidential and may be subject to legal 
privilege.  
If it is not intended for you please advise by replying immediately, destroy it 
and do not 
copy, disclose or use it in any way.

The Datamail Group, through our GoGreen programme, is committed to 
environmental sustainability.  
Help us in our efforts by not printing this email.
__
  This email has been scanned by the DMZGlobal Business Quality
  Electronic Messaging Suite.
Please see http://www.dmzglobal.com/dmzmessaging.htm for details.
__



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-02 Thread Robert Hodges
Hi Hannu,

Hi Hannu,


On 6/1/08 2:14 PM, Hannu Krosing [EMAIL PROTECTED] wrote:


 As a consequence, I don¹t see how you can get around doing some sort
 of row-based replication like all the other databases.

 Is'nt WAL-base replication some sort of row-based replication ?

Yes, in theory.  However, there's a big difference between replicating
physical WAL records and doing logical replication with SQL statements.
Logical replication requires extra information to reconstruct primary keys.
(Somebody tell me if this is already in the WAL; I'm learning the code as
fast as possible but assuming for now it's not.)


  Now that people are starting to get religion on this issue I would
 strongly advocate a parallel effort to put in a change-set extraction
 API that would allow construction of comprehensive master/slave
 replication.

 Triggers. see pgQ's logtrigga()/logutrigga(). See slides for Marko
 Kreen's presentation at pgCon08.


Thanks very much for the pointer.  The slides look interesting.

Robert


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-02 Thread Chris Browne
[EMAIL PROTECTED] (Andreas 'ads' Scherbaum) writes:
 On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote:

 Well, yes, but you do know about archive_timeout, right? No need to wait 
 2 hours.

 Then you ship 16 MB binary stuff every 30 second or every minute but
 you only have some kbyte real data in the logfile. This must be taken
 into account, especially if you ship the logfile over the internet
 (means: no high-speed connection, maybe even pay-per-traffic) to the
 slave.

If you have that kind of scenario, then you have painted yourself into
a corner, and there isn't anything that can be done to extract you
from it.

Consider: If you have so much update traffic that it is too much to
replicate via WAL-copying, why should we expect that other mechanisms
*wouldn't* also overflow the connection?

If you haven't got enough network bandwidth to use this feature, then
nobody is requiring that you use it.  It seems like a perfectly
reasonable prerequisite to say this requires that you have enough
bandwidth.
-- 
(reverse (concatenate 'string ofni.secnanifxunil @ enworbbc))
http://www3.sympatico.ca/cbbrowne/
There's nothing worse than having only one drunk head.
-- Zaphod Beeblebrox

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-02 Thread Hannu Krosing
On Thu, 2008-05-29 at 23:37 +0200, Mathias Brossard wrote:

 I pointed out that the NTT solution is synchronous because Tom said in 
 the first part of his email that:
 
   In practice, simple asynchronous single-master-multiple-slave
   replication covers a respectable fraction of use cases, so we have
   concluded that we should allow such a feature to be included in the
   core project.
 
 ... and yet the most appropriate base technology for this is 
 synchronous and maybe I should have also pointed out in my previous mail 
 is that it doesn't support multiple slaves.

I don't think that you need too many slaves in sync mode.

Probably 1-st slave sync and others async from there on will be good
enough.

 Also, as other have pointed out there are different interpretations of 
 synchronous depending on wether the WAL data has reached the other end 
 of the network connection, a safe disk checkpoint or the slave DB itself.

Probably all DRBD-s levels ( A) data sent to network, B) data received,
C) data written to disk) should be supported + C1) data replayed in
slave DB. C1 meaning that it can be done in parallel with C)

Then each DBA can set it up depending on what he trusts - network,
slave's power supply or slaves' disk.

Also, the case of slave failure should be addressed. I don't think that
the best solution is halting all ops on master if slave/network fails.

Maybe we should allow also a setup with 2-3 slaves, where operations can
continue when at least 1 slave is syncing ?

--
Hannu


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-02 Thread Andreas 'ads' Scherbaum
On Mon, 02 Jun 2008 11:52:05 -0400 Chris Browne wrote:

 [EMAIL PROTECTED] (Andreas 'ads' Scherbaum) writes:
  On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote:
 
  Well, yes, but you do know about archive_timeout, right? No need to wait 
  2 hours.
 
  Then you ship 16 MB binary stuff every 30 second or every minute but
  you only have some kbyte real data in the logfile. This must be taken
  into account, especially if you ship the logfile over the internet
  (means: no high-speed connection, maybe even pay-per-traffic) to the
  slave.
 
 If you have that kind of scenario, then you have painted yourself into
 a corner, and there isn't anything that can be done to extract you
 from it.

You are misunderstanding something. It's perfectly possible that you
have a low-traffic database with changes every now and then. But you
have to copy a full 16 MB logfile every 30 seconds or every minute just
to have the slave up-to-date.


 Consider: If you have so much update traffic that it is too much to
 replicate via WAL-copying, why should we expect that other mechanisms
 *wouldn't* also overflow the connection?

For some MB real data you copy several GB logfiles per day - that's a
lot overhead, isn't it?


 If you haven't got enough network bandwidth to use this feature, then
 nobody is requiring that you use it.  It seems like a perfectly
 reasonable prerequisite to say this requires that you have enough
 bandwidth.

If you have a high-traffic database, then of course you need an other
connection as if you only have a low-traffic or a mostly read-only
database. But that's not the point. Copying an almost unused 16 MB WAL
logfile is just overhead - especially because the logfile is not
compressable very much because of all the leftovers from earlier use.


Kind regards

-- 
Andreas 'ads' Scherbaum
German PostgreSQL User Group

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-01 Thread Robert Hodges
Hi Merlin,

My point here is that with reasonably small extensions to the core you can 
build products that are a lot better than SLONY.   Triggers do not cover DDL, 
among other issues, and it's debatable whether they are the best way to 
implement quorum policies like Google's semi-synchronous replication.  As I 
mentioned separately this topic deserves another thread which I promise to 
start.

It is of course possible to meet some of these needs with an appropriate client 
interface to WAL shipping.  There's no a-priori reason why built-in PostgreSQL 
slaves need to be the only client.  I would put a vote in for covering this 
possibility in the initial replication design.  We are using a very similar 
approach in our own master/slave replication product.

Thanks, Robert

P.S., No offense intended to Jan Wieck et al.  There are some pretty cool 
things in SLONY.

On 5/29/08 8:16 PM, Merlin Moncure [EMAIL PROTECTED] wrote:

On Thu, May 29, 2008 at 3:05 PM, Robert Hodges
[EMAIL PROTECTED] wrote:
 Third, you can't stop with just this feature.  (This is the BUT part of the
 post.)  The use cases not covered by this feature area actually pretty
 large.  Here are a few that concern me:

 1.) Partial replication.
 2.) WAN replication.
 3.) Bi-directional replication.  (Yes, this is evil but there are problems
 where it is indispensable.)
 4.) Upgrade support.  Aside from database upgrade (how would this ever
 really work between versions?), it would not support zero-downtime app
 upgrades, which depend on bi-directional replication tricks.
 5.) Heterogeneous replication.
 6.) Finally, performance scaling using scale-out over large numbers of
 replicas.  I think it's possible to get tunnel vision on this-it's not a big
 requirement in the PG community because people don't use PG in the first
 place when they want to do this.  They use MySQL, which has very good
 replication for performance scaling, though it's rather weak for
 availability.

These type of things are what Slony is for.  Slony is trigger based.
This makes it more complex than log shipping style replication, but
provides lots of functionality.

wal shipping based replication is maybe the fastest possible
solution...you are already paying the overhead so it comes virtually
for free from the point of view of the master.

mysql replication is imo nearly worthless from backup standpoint.

merlin



--
Robert Hodges, CTO, Continuent, Inc.
Email:  [EMAIL PROTECTED]
Mobile:  +1-510-501-3728  Skype:  hodgesrm


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-01 Thread Merlin Moncure
On Sun, Jun 1, 2008 at 11:58 AM, Robert Hodges
[EMAIL PROTECTED] wrote:
 Hi Merlin,

 My point here is that with reasonably small extensions to the core you can
 build products that are a lot better than SLONY.   Triggers do not cover
 DDL, among other issues, and it's debatable whether they are the best way to
 implement quorum policies like Google's semi-synchronous replication.  As I
 mentioned separately this topic deserves another thread which I promise to
 start.

These issues are much discussed and well understood.  At this point,
the outstanding points of discussion are technical...how to make this
thing work.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-01 Thread Tom Lane
Merlin Moncure [EMAIL PROTECTED] writes:
 On Sun, Jun 1, 2008 at 11:58 AM, Robert Hodges
 [EMAIL PROTECTED] wrote:
 My point here is that with reasonably small extensions to the core you can
 build products that are a lot better than SLONY.

 These issues are much discussed and well understood.

Well, what we know is that previous attempts to define replication hooks
to be added to the core have died for lack of interest.  Maybe Robert
can start a new discussion that will actually get somewhere; if so, more
power to him.  (Is the replica-hooks-discuss list still working?)  But
that is entirely orthogonal to what is proposed in this thread, which
is to upgrade the existing PITR support into a reasonably useful
replication feature.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-01 Thread Dawid Kuroczko
On Thu, May 29, 2008 at 4:12 PM, Tom Lane [EMAIL PROTECTED] wrote:
 The Postgres core team met at PGCon to discuss a few issues, the largest
 of which is the need for simple, built-in replication for PostgreSQL.
[...]
 We believe that the most appropriate base technology for this is
1 probably real-time WAL log shipping, as was demoed by NTT OSS at PGCon.
 We hope that such a feature can be completed for 8.4.  Ideally this
 would be coupled with the ability to execute read-only queries on the
 slave servers, but we see technical difficulties that might prevent that
 from being completed before 8.5 or even further out.  (The big problem
 is that long-running slave-side queries might still need tuples that are
 vacuumable on the master, and so replication of vacuuming actions would
 cause the slave's queries to deliver wrong answers.)

 Again, this will not replace Slony, pgPool, Continuent, Londiste, or
 other systems for many users, as it will be not be highly scalable nor
 support long-distance replication nor replicating less than an entire
 installation.  But it is time to include a simple, reliable basic
 replication feature in the core system.

Hello!

I thought I would share a few thoughts of my own about the issue.
I have a hands-on experience with Oracle and MySQL apart from
PostgreSQL so I hope it will be a bit interesting.

The former has a feature called physical standby, which looks
quite like our WAL-shipping based replication.  Simply archived
logs are replayed on the standby database.  A primary database
and standby database are connected, and can stream the logs
directly.  They either copy the log when its finished (as we do now)
or can do it in coninuous manner (as I hope we will be able to).

It is possible to have a synchronous replication (where COMMIT
on primary database succeeds when the data is safely stored on
the standby database).  I think such a feature would be a great
advantage for PostgreSQL (where you cannot afford to loose
any transactions).

Their standby database is not accessible.  It can be opened read-only,
but during that time replication stops.  So PostgreSQL having
read-only and still replicating standby database would be great.

The other method is logical standby which works by dissecting
WAL-logs and recreating DDLs/DMLs from it.  Never seen anyone
use it. ;-)

Then we have a mysql replication -- done by replaying actual DDLs/DMLs
on the slaves.  This approach has issues, most notably when slaves are
highly loaded and lag behind the master -- so you end up with infrastructure
to monitor lags and turn off slaves which lag too much.  Also it is painful
to setup -- you have to stop, copy, configure and run.

* Back to PostgreSQL world

As for PostgreSQL solutions we have a slony-I, which is great as long as
you don't have too many people managing the database and/or your
schema doesn't change too frequently.  Perhaps it would be maintainable
more easily if there would be to get DDLs (as DDL triggers or similar).
Its main advantages for me is ability to prepare complex setups and
easily add new slaves).  The pgpool solution is quite nice but then
again adding a new slave is not so easy.  And being a filtering
layer between client and server it feels a bit fragile (I know it is not,
but then again it is harder to convince someone that yes it will work
100% right all the time).

* How I would like PostgreSQL WAL-replication to evolve:

First of all it would be great if a slave/standby would contact the master
and maintain the state with it (tell it its xmin, request a log to stream,
go online-streaming).  Especially I hope that it should be possible
to make a switchover (where the two databases exchange roles),
and in this the direct connection between the two should help.

In detail, I think it should go like this:
* A slave database starts up, checks that it works as a replica
(hopefully it would not be a postgresql.conf constant, but rather
some file maintained by the database).
* It would connect to the master database, tell where in the WAL
it is now, and request a log N.
* If log N is not available, request a log from external supplied
script (so that it could be fetched from log archive repository
somewhere, recovered from a backup tape, etc).
* Continue asking, until we get to the logs which are available
at master database.
* Continue replaying until we get within max_allowed_replication_lag
time, and open our slave for read-only queries.
* If we start lagging too much perhaps close the read-only access
to the database (perhaps configurable?).

I think that replication should be easy to set up.  I think our
archive_command is quite easy, but many a person come
with a lot of misconceptions how it works (and it takes time
to explain them how it actually work, especially what is
archive_command for, and that pg_start_backup() doesn't
actually _do_ backup, but just tells PostgreSQL that
backup is being done).

Easy to setup and easy to switchover (change the 

Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-01 Thread James Mansion

David Fetter wrote:

This part is a deal-killer.  It's a giant up-hill slog to sell warm
standby to those in charge of making resources available because the
warm standby machine consumes SA time, bandwidth, power, rack space,
etc., but provides no tangible benefit, and this feature would have
exactly the same problem.

IMHO, without the ability to do read-only queries on slaves, it's not
worth doing this feature at all.
  
That's not something that squares with my experience *at all*, which 
admitedly is entirely in
investment banks. Business continuity is king, and in some places the 
warm standby rep
from the database vendor is trusted more than block-level rep from the 
SAN vendor

(though that may be changing to some extent in favour of the SAN).

James


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-01 Thread James Mansion

Aidan Van Dyk wrote:

The whole single-threaded WAL replay problem is going to rear it's ugly
head here too, and mean that a slave *won't* be able to keep up with a
busy master if it's actually trying to apply all the changes in
real-time.
Is there a reason to commit at the same points that the master 
committed?  Wouldn't relaxing
that mean that at least you would get 'big' commits and some economy of 
scale?  It might
not be too bad.  All I can say is that Sybase warm standby is useful, 
even though the rep
for an update that changes a hundred rows is a hundred updates keyed on 
primary key,

which is pretty sucky in terms of T-SQL performance.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-01 Thread Hannu Krosing
On Thu, 2008-05-29 at 13:37 -0400, Tom Lane wrote:
 David Fetter [EMAIL PROTECTED] writes:
  On Thu, May 29, 2008 at 08:46:22AM -0700, Joshua D. Drake wrote:
  The only question I have is... what does this give us that PITR
  doesn't give us?
 
  It looks like a wrapper for PITR to me, so the gain would be ease of
  use.
 
 A couple of points about that:
 
 * Yeah, ease of use is a huge concern here.  We're getting beat up
 because people have to go find a separate package (and figure out
 which one they want), install it, learn how to use it, etc.  It doesn't
 help that the most mature package is Slony which is, um, not very
 novice-friendly or low-admin-complexity.  I personally got religion
 on this about two months ago when Red Hat switched their bugzilla
 from Postgres to MySQL because the admins didn't want to deal with Slony
 any more.  People want simple.
 
 * The proposed approach is trying to get to real replication
 incrementally.  Getting rid of the loss window involved in file-by-file
 log shipping is step one, 

Actually we can already do better than file-by-file by using
pg_xlogfile_name_offset() which was added sometime in 2006. SkyTools for
example does this to get no more than a few seconds failure window.

Doing this synchronously would be of course better.

probably we should use the same modes/protocols as DRBD when
determining when a sync wal write is done

quote from 
http://www.slackworks.com/~dkrovich/DRBD/usingdrbdsetup.html#AEN76


Table 1. DRBD Protocols

Protocol
Description
 A
A write operation is complete as
soon as the data is written to disk
and sent to the network.
 B
A write operation is complete as
soon as a reception acknowledgement
arrives.
 C
A write operation is complete as
soon as a write acknowledgement
arrives.

There are also additional paramaters you can pass to the disk and net
options. See the drbdsetup man page for additional information

/end quote

 and I suspect that step two is going to be
 fixing performance issues in WAL replay to ensure that slaves can keep
 up.  After that we'd start thinking about how to let slaves run
 read-only queries.  But even without read-only queries, this will be
 a useful improvement for HA/backup scenarios.
 
   regards, tom lane
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-01 Thread Hannu Krosing
On Fri, 2008-05-30 at 15:16 -0400, Robert Treat wrote:
 On Friday 30 May 2008 01:10:20 Tom Lane wrote:
  Greg Smith [EMAIL PROTECTED] writes:
   I fully accept that it may be the case that it doesn't make technical
   sense to tackle them in any order besides sync-read-only slaves because
   of dependencies in the implementation between the two.
 
  Well, it's certainly not been my intention to suggest that no one should
  start work on read-only-slaves before we finish the other part.  The
  point is that I expect the log shipping issues will be done first
  because they're easier, and it would be pointless to not release that
  feature if we had it.
 
  But since you mention it: one of the plausible answers for fixing the
  vacuum problem for read-only slaves is to have the slaves push an xmin
  back upstream to the master to prevent premature vacuuming.  The current
  design of pg_standby is utterly incapable of handling that requirement.
  So there might be an implementation dependency there, depending on how
  we want to solve that problem.
 
 
 Sure, but whose to say that after synchronous wal shipping is finished it 
 wont need a serious re-write due to new needs from the hot standby feature. I 
 think going either way carries some risk. 

The simplest form of synchronous wal shipping would not even need
postgresql running on slave, just a small daemon which reports when wal
blocks are a) received and b) synced to disk. 

This setup would just guarantee no data loss on single machine
failure. form there on you could add various features, including
support for both switchover and failover, async replication to multiple
slaves, etc.

the only thing that needs anything additional from slave wal-receiving
daemon is when you want the kind of wal-sync which would guarantee that
read-only query on slave issued after commit returns from master sees
latest data. for this kinds of guarantees you need at least feedback
about wal-replay, but possibly also shared transaction numbers and
shared snapshots, to be sure that OLTP type queries see the latest and
OLAP queries are not denied seeing VACUUMED on master.

--
Hannu



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-06-01 Thread Hannu Krosing
On Thu, 2008-05-29 at 12:05 -0700, Robert Hodges wrote:
 Hi everyone, 
 
 First of all, I’m absolutely delighted that the PG community is
 thinking seriously about replication.  
 
 Second, having a solid, easy-to-use database availability solution
 that works more or less out of the box would be an enormous benefit to
 customers.  Availability is the single biggest problem for customers
 in my experience and as other people have commented the alternatives
 are not nice.  It’s an excellent idea to build off an existing feature
 —PITR is already pretty useful and the proposed features are solid
 next steps.  The fact that it does not solve all problems is not a
 drawback but means it’s likely to get done in a reasonable timeframe. 
 
 Third, you can’t stop with just this feature.  (This is the BUT part
 of the post.)  The use cases not covered by this feature area actually
 pretty large.  Here are a few that concern me: 
 
 1.) Partial replication. 
 2.) WAN replication. 

1.)  2.) are better done asunc, the domain of Slony-I/Londiste

 3.) Bi-directional replication.  (Yes, this is evil but there are
 problems where it is indispensable.) 

Sure, it is also a lot harder and always has several dimensions
(performanse/availability7locking) which play against each other

 4.) Upgrade support.  Aside from database upgrade (how would this ever
 really work between versions?), it would not support zero-downtime app
 upgrades, which depend on bi-directional replication tricks. 

Or you could use zero-downtime  app upgrades, which don't depend on
this :P

 5.) Heterogeneous replication. 
 6.) Finally, performance scaling using scale-out over large numbers of
 replicas.  I think it’s possible to get tunnel vision on this—it’s not
 a big requirement in the PG community because people don’t use PG in
 the first place when they want to do this.  They use MySQL, which has
 very good replication for performance scaling, though it’s rather weak
 for availability.  

Again, doing scale-out over large number of replicas should either be
async or for sync use some broadcast channel to all slaves (and still be
a performance problem on master, as it has to wait for slowest slave).

 As a consequence, I don’t see how you can get around doing some sort
 of row-based replication like all the other databases. 

Is'nt WAL-base replication some sort of row-based replication ?

  Now that people are starting to get religion on this issue I would
 strongly advocate a parallel effort to put in a change-set extraction
 API that would allow construction of comprehensive master/slave
 replication. 

Triggers. see pgQ's logtrigga()/logutrigga(). See slides for Marko
Kreen's presentation at pgCon08.

  (Another approach would be to make it possible for third party apps
 to read the logs and regenerate SQL.) 

which logs ? WAL or SQL command logs ?

 There are existing models for how to do change set extraction; we have
 done it several times at my company already.  There are also research
 projects like GORDA that have looked fairly comprehensively at this
 problem.

pgQ with its triggers does a pretty good job of change-set extraction.

--
Hannu



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-31 Thread Dimitri Fontaine
Le vendredi 30 mai 2008, Dimitri Fontaine a écrit :
 This way, no need to switch IP addresses, the clients just connect as usual
 and get results back and do not have to know whether the host they're
 qerying against is a slave or a master. This level of smartness is into
 -core.

Oh, and if you want clients to connect to a single IP and hit either the 
master or the slave with some weights to choose one or the other, and a way 
to remove from pool on failure etc, I think using haproxy in TCP mode would 
do it. HaProxy is really nice for this purpose.
  http://haproxy.1wt.eu/

Regards,
-- 
dim


signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-31 Thread Mike Rylander
On Fri, May 30, 2008 at 6:47 PM, Andreas 'ads' Scherbaum
[EMAIL PROTECTED] wrote:
 On Fri, 30 May 2008 17:05:57 -0400 Andrew Dunstan wrote:
 Andreas 'ads' Scherbaum wrote:
  On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote:
 
  Well, yes, but you do know about archive_timeout, right? No need to wait
  2 hours.
 
  Then you ship 16 MB binary stuff every 30 second or every minute but
  you only have some kbyte real data in the logfile. This must be taken
  into account, especially if you ship the logfile over the internet
  (means: no high-speed connection, maybe even pay-per-traffic) to the
  slave.

 Sure there's a price to pay. But that doesn't mean the facility doesn't
 exist. And I rather suspect that most of Josh's customers aren't too
 concerned about traffic charges or affected by such bandwidth
 restrictions. Certainly, none of my clients are, and they aren't in the
 giant class. Shipping a 16Mb file, particularly if compressed, every
 minute or so, is not such a huge problem for a great many commercial
 users, and even many domestic users.

 The real problem is not the 16 MB, the problem is: you can't compress
 this file. If the logfile is rotated it still contains all the
 old binary data which is not a good starter for compression.

Using bzip2 in my archive_command script, my WAL files are normally
compressed to between 2MB and 5MB, depending on the write load
(larger, and more of them, in the middle of the day).  bzip2
compression is more expensive and rotated WAL files are not
particularly compressable to be sure, but due to (and given) the
nature of the data bzip2 works pretty well, and much better than gzip.


 So you may have some kB changes in the wal logfile every minute but you
 still copy 16 MB data. Sure, it's not so much - but if you rotate a
 logfile every minute this still transfers 16*60*24 = ~23 GB a day.


I archived 1965 logs yesterday on one instance of my app totalling
8.5GB ... not to bad, really.

-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone: 1-877-OPEN-ILS (673-6457)
 | email: [EMAIL PROTECTED]
 | web: http://www.esilibrary.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-31 Thread Merlin Moncure
On Sat, May 31, 2008 at 2:18 AM, Mike Rylander [EMAIL PROTECTED] wrote:
 On Fri, May 30, 2008 at 6:47 PM, Andreas 'ads' Scherbaum
 [EMAIL PROTECTED] wrote:
 On Fri, 30 May 2008 17:05:57 -0400 Andrew Dunstan wrote:
 Andreas 'ads' Scherbaum wrote:
  On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote:
 
  Well, yes, but you do know about archive_timeout, right? No need to wait
  2 hours.
 
  Then you ship 16 MB binary stuff every 30 second or every minute but
  you only have some kbyte real data in the logfile. This must be taken
  into account, especially if you ship the logfile over the internet
  (means: no high-speed connection, maybe even pay-per-traffic) to the
  slave.

 Sure there's a price to pay. But that doesn't mean the facility doesn't
 exist. And I rather suspect that most of Josh's customers aren't too
 concerned about traffic charges or affected by such bandwidth
 restrictions. Certainly, none of my clients are, and they aren't in the
 giant class. Shipping a 16Mb file, particularly if compressed, every
 minute or so, is not such a huge problem for a great many commercial
 users, and even many domestic users.

 The real problem is not the 16 MB, the problem is: you can't compress
 this file. If the logfile is rotated it still contains all the
 old binary data which is not a good starter for compression.

 Using bzip2 in my archive_command script, my WAL files are normally
 compressed to between 2MB and 5MB, depending on the write load
 (larger, and more of them, in the middle of the day).  bzip2
 compression is more expensive and rotated WAL files are not
 particularly compressable to be sure, but due to (and given) the
 nature of the data bzip2 works pretty well, and much better than gzip.

Compression especially is going to negate one of the big advantages of
wal shipping, namely that it is cheap investment in terms of load to
the main.  A gigabit link can ship a lot of log files, you can always
bond and 10gige is coming.  IMO the key trick is to make sure you
don't send the log file more than once from the same source...i.e
cascading relay.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-31 Thread Joshua D. Drake

Merlin Moncure wrote:

On Sat, May 31, 2008 at 2:18 AM, Mike Rylander [EMAIL PROTECTED] wrote:

On Fri, May 30, 2008 at 6:47 PM, Andreas 'ads' Scherbaum



Compression especially is going to negate one of the big advantages of
wal shipping, namely that it is cheap investment in terms of load to
the main.  A gigabit link can ship a lot of log files, you can always


Who has a gigabit link between Dallas and Atlanta? That is the actual 
problem here. Switch to Switch compression is a waste of time (if you 
aren't running GiGE, what are you doing???).


Sincerely,

Joshua D. Drake

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-30 Thread Gurjeet Singh
On Fri, May 30, 2008 at 10:40 AM, Tom Lane [EMAIL PROTECTED] wrote:

 But since you mention it: one of the plausible answers for fixing the
 vacuum problem for read-only slaves is to have the slaves push an xmin
 back upstream to the master to prevent premature vacuuming.  The current
 design of pg_standby is utterly incapable of handling that requirement.
 So there might be an implementation dependency there, depending on how
 we want to solve that problem.


I think it would be best to not make the slave interfere with the master's
operations; that's only going to increase the operational complexity of such
a solution.

There could be multiple slaves following a master, some serving
data-warehousing queries, some for load-balancing reads, some others just
for disaster recovery, and then some just to mitigate human errors by
re-applying the logs with a delay.

I don't think any one installation would see all of the above mentioned
scenarios, but we need to take care of multiple slaves operating off of a
single master; something similar to cascaded Slony-I.

My two cents.

Best regards,
-- 
[EMAIL PROTECTED]
[EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB http://www.enterprisedb.com

Mail sent from my BlackLaptop device


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-30 Thread Andrew Sullivan
On Thu, May 29, 2008 at 01:58:34PM -0700, David Fetter wrote:
 
 If people on core had come to the idea that we needed to build in
 replication *before* 8.3 came out, they certainly didn't announce it.
 
 Now is a great time to mention this because it gives everybody time to:
 
 1.  Come to a consensus on what the out-of-the-box replication should
 be, and 
 
 2.  Build, test and debug whatever the consensus out-of-the-box
 replication turns out to be.

None of that is an argument for why this has to go in 8.4.

I argued in Ottawa that the idea that you have to plan a feature for
_the next release_ is getting less tenable with each release.  This is
because major new features for Postgres are now often big and
complicated.  The days of big gains from single victories are mostly
over (though there are exceptions, like HOT).  Postgres is already
mature.  As for the middle-aged person with a mortgage, longer-term
planning is simply a necessary part of life now.

There are two possibilities here.  One is to have huge releases on
much longer timetables.  I think this is unsustainable in a free
project, because people will get bored and go away if they don't get
to use the results of their work in a reasonably short time frame.
The other is to accept that sometimes, planning and development for
new features will have to start a long time before actual release --
maybe planning and some coding for 2 releases out.  That allows large
features like the one we're discussing to be developed responsibly
without making everything else wait for it.

A

-- 
Andrew Sullivan
[EMAIL PROTECTED]
+1 503 667 4564 x104
http://www.commandprompt.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-30 Thread Marko Kreen
On 5/30/08, Gurjeet Singh [EMAIL PROTECTED] wrote:
 On Fri, May 30, 2008 at 10:40 AM, Tom Lane [EMAIL PROTECTED] wrote:
  But since you mention it: one of the plausible answers for fixing the
  vacuum problem for read-only slaves is to have the slaves push an xmin
  back upstream to the master to prevent premature vacuuming.  The current
  design of pg_standby is utterly incapable of handling that requirement.
  So there might be an implementation dependency there, depending on how
  we want to solve that problem.

 I think it would be best to not make the slave interfere with the master's
 operations; that's only going to increase the operational complexity of such
 a solution.

I disagree - it's better to consider syncronized WAL-slaves
as equal to master, so having queries there affect master is ok.

You need to remeber this solution tries not to replace 100-node Slony-I
setups.  You can run sanity checks on slaves or use them to load-balance
read-only OLTP queries, but not random stuff.

 There could be multiple slaves following a master, some serving
 data-warehousing queries, some for load-balancing reads, some others just
 for disaster recovery, and then some just to mitigate human errors by
 re-applying the logs with a delay.

To run warehousing queries you better use Slony-I / Londiste.  For
warehousring you want different / more indexes on tables anyway,
so I think it's quite ok to say don't do it for complex queries
on WAL-slaves.

 I don't think any one installation would see all of the above mentioned
 scenarios, but we need to take care of multiple slaves operating off of a
 single master; something similar to cascaded Slony-I.

Again, the synchronized WAL replication is not generic solution.
Use Slony/Londiste if you want to get totally independent slaves.

Thankfully the -core has set concrete and limited goals,
that means it is possible to see working code in reasonable time.
I think that should apply to read-only slaves too.

If we try to make it handle any load, it will not be finished in any time.

Now if we limit the scope I've seen 2 variants thus far:

1) Keep slave max in sync, let the load there affect master (xmin).
  - Slave can be used to load-balance OLTP load
  - Slave should not be used for complex queries.

2) If long query is running, let slave lag (avoid applying WAL data).
  - Slave cannot be used to load-balance OLTP load
  - Slave can be used for complex queries (although no new indexes
or temp tables can be created).

I think 1) is more important (and more easily implementable) case.

For 2) we already have solutions (Slony/Londiste/Bucardo, etc)
so there is no point to make effort to solve this here.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-30 Thread Andrew Dunstan



Tom Lane wrote:

Simon Riggs [EMAIL PROTECTED] writes:
  

On Fri, 2008-05-30 at 12:31 +0530, Gurjeet Singh wrote:


On Fri, May 30, 2008 at 10:40 AM, Tom Lane [EMAIL PROTECTED] wrote:
  

But since you mention it: one of the plausible answers for fixing the
vacuum problem for read-only slaves is to have the slaves push an xmin
back upstream to the master to prevent premature vacuuming.


I think it would be best to not make the slave interfere with the
master's operations; that's only going to increase the operational
complexity of such a solution.
  


  

We ruled that out as the-only-solution a while back. It does have the
beauty of simplicity, so it may exist as an option or possibly the only
way, for 8.4.



Yeah.  The point is that it's fairly clear that we could make that work.
A solution that doesn't impact the master at all would be nicer, but
it's not at all clear to me that one is possible, unless we abandon
WAL-shipping as the base technology.


  


Quite. Before we start ruling things out let's know what we think we can 
actually do.


I hope that NTT will release their code ASAP so we will have a better 
idea of what we have and what we need.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-30 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 On Fri, 2008-05-30 at 12:31 +0530, Gurjeet Singh wrote:
 On Fri, May 30, 2008 at 10:40 AM, Tom Lane [EMAIL PROTECTED] wrote:
 But since you mention it: one of the plausible answers for fixing the
 vacuum problem for read-only slaves is to have the slaves push an xmin
 back upstream to the master to prevent premature vacuuming.

 I think it would be best to not make the slave interfere with the
 master's operations; that's only going to increase the operational
 complexity of such a solution.

 We ruled that out as the-only-solution a while back. It does have the
 beauty of simplicity, so it may exist as an option or possibly the only
 way, for 8.4.

Yeah.  The point is that it's fairly clear that we could make that work.
A solution that doesn't impact the master at all would be nicer, but
it's not at all clear to me that one is possible, unless we abandon
WAL-shipping as the base technology.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-30 Thread Robert Hodges
Hi Tom,

Thanks for the reasoned reply.  As you saw from point #2 in my comments, I
think you should do this feature.  I hope this answers Josh Berkus' concern
about my comments.

You make a very interesting comment which seems to go to the heart of this
design approach:

 About the only thing that would make me want to consider row-based
 replication in core would be if we determine that read-only slave
 queries are impractical atop a WAL-log-shipping implementation.

It's possible I'm misunderstanding some of the implementation issues, but it
is striking that the detailed responses to your proposal list a number of
low-level dependencies between master and slave states when replicating WAL
records.  It appears that you are designing a replication mechanism that
works effectively between a master and a relatively small number of nearby
slaves.  This is clearly an important use case but it also seems clear that
the WAL approach is not a general-purpose approach to replication.  In other
words, you'll incrementally get to that limited end point I describe.  This
will still leave a lot to be desired on read scaling, not to mention many
other cases.

Hence my original comments.  However, rather than harp on that further I
will open up a separate thread to describe a relatively small set of
extensions to PostgreSQL that would be enabling for a wide range of
replication applications.  Contrary to popular opinion these extensions are
actually well understood at the theory level and have been implemented as
prototypes as well as in commercial patches multiple times in different
databases.  Those of us who are deeply involved in replication deserve just
condemnation for not stepping up and getting our thoughts out on the table.

Meanwhile, I would be interested in your reaction to these thoughts on the
scope of the real-time WAL approach.  There's obviously tremendous interest
in this feature.  A general description that goes beyond the NTT slides
would be most helpful for further discussions.

Cheers, Robert

P.s., The NTT slides were really great.  Takahiro and Masao deserve
congratulations on an absolutely first-rate presentation.

On 5/29/08 9:09 PM, Tom Lane [EMAIL PROTECTED] wrote:

 Andrew Sullivan [EMAIL PROTECTED] writes:
 On Thu, May 29, 2008 at 12:05:18PM -0700, Robert Hodges wrote:
 people are starting to get religion on this issue I would strongly
 advocate a parallel effort to put in a change-set extraction API
 that would allow construction of comprehensive master/slave
 replication.

 You know, I gave a talk in Ottawa just last week about how the last
 effort to develop a comprehensive API for replication failed.

 Indeed, core's change of heart on this issue was largely driven by
 Andrew's talk and subsequent discussion.  We had more or less been
 waiting for the various external replication projects to tell us
 what they wanted in this line, and it was only the realization that
 no such thing was likely to happen that forced us to think seriously
 about what could be done within the core project.

 As I said originally, we have no expectation that the proposed features
 will displace the existing replication projects for high end
 replication problems ... and I'd characterize all of Robert's concerns
 as high end problems.  We are happy to let those be solved outside
 the core project.

 About the only thing that would make me want to consider row-based
 replication in core would be if we determine that read-only slave
 queries are impractical atop a WAL-log-shipping implementation.
 Which could happen; in fact I think that's the main risk of the
 proposed development plan.  But I also think that the near-term
 steps of the plan are worth doing anyway, for various other reasons,
 and so we won't be out too much effort if the plan fails.

 regards, tom lane



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-30 Thread Merlin Moncure
On Fri, May 30, 2008 at 9:31 AM, Marko Kreen [EMAIL PROTECTED] wrote:
 On 5/30/08, Gurjeet Singh [EMAIL PROTECTED] wrote:

 I think it would be best to not make the slave interfere with the master's
 operations; that's only going to increase the operational complexity of such
 a solution.

 I disagree - it's better to consider syncronized WAL-slaves
 as equal to master, so having queries there affect master is ok.

 You need to remeber this solution tries not to replace 100-node Slony-I
 setups.  You can run sanity checks on slaves or use them to load-balance
 read-only OLTP queries, but not random stuff.

 There could be multiple slaves following a master, some serving
 data-warehousing queries, some for load-balancing reads, some others just
 for disaster recovery, and then some just to mitigate human errors by
 re-applying the logs with a delay.

 To run warehousing queries you better use Slony-I / Londiste.  For
 warehousring you want different / more indexes on tables anyway,
 so I think it's quite ok to say don't do it for complex queries
 on WAL-slaves.

 I don't think any one installation would see all of the above mentioned
 scenarios, but we need to take care of multiple slaves operating off of a
 single master; something similar to cascaded Slony-I.

 Again, the synchronized WAL replication is not generic solution.
 Use Slony/Londiste if you want to get totally independent slaves.

I strongly agree with Gurjeet.  The warm standby replication mechanism
is pretty simple and is wonderfully flexible with the one big
requirement that your clusters have to be mirrors of each other.

Synchronous wal replication obviously needs some communication channel
from the slave back to the master. Hopefully, it will be possible to
avoid this for asynchronous shipping.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-30 Thread Simon Riggs

On Fri, 2008-05-30 at 11:30 -0400, Andrew Dunstan wrote:
 
 Tom Lane wrote:
  Simon Riggs [EMAIL PROTECTED] writes:

  On Fri, 2008-05-30 at 12:31 +0530, Gurjeet Singh wrote:
  
  On Fri, May 30, 2008 at 10:40 AM, Tom Lane [EMAIL PROTECTED] wrote:

  But since you mention it: one of the plausible answers for fixing the
  vacuum problem for read-only slaves is to have the slaves push an xmin
  back upstream to the master to prevent premature vacuuming.
  
  I think it would be best to not make the slave interfere with the
  master's operations; that's only going to increase the operational
  complexity of such a solution.

  We ruled that out as the-only-solution a while back. It does have the
  beauty of simplicity, so it may exist as an option or possibly the only
  way, for 8.4.
  
  Yeah.  The point is that it's fairly clear that we could make that work.
  A solution that doesn't impact the master at all would be nicer, but
  it's not at all clear to me that one is possible, unless we abandon
  WAL-shipping as the base technology.
 
 Quite. Before we start ruling things out let's know what we think we can 
 actually do.

Let me re-phrase: I'm aware of that possibility and believe we can and
could do it for 8.4. My assessment is that people won't find it
sufficient and I am looking at other alternatives also. There may be a
better one possible for 8.4, there may not. Hence I've said something
in 8.4, something better later. There is no need to decide that is the
only way forward, yet.

I hope and expect to put some of these ideas into a more concrete form,
but this has not yet happened. Nothing has slipped, not having any
trouble getting on with it, just that my plans were to not start it yet.
I think having a detailed design ready for review by September commit
fest is credible.

 I hope that NTT will release their code ASAP so we will have a better 
 idea of what we have and what we need.

That has very little to do with Hot Standby, though there could be patch
conflicts, which is why I'm aiming to get WAL streaming done first.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-30 Thread Simon Riggs

On Fri, 2008-05-30 at 12:31 +0530, Gurjeet Singh wrote:
 On Fri, May 30, 2008 at 10:40 AM, Tom Lane [EMAIL PROTECTED] wrote:
 But since you mention it: one of the plausible answers for
 fixing the
 vacuum problem for read-only slaves is to have the slaves push
 an xmin
 back upstream to the master to prevent premature vacuuming.
  The current
 design of pg_standby is utterly incapable of handling that
 requirement.
 So there might be an implementation dependency there,
 depending on how
 we want to solve that problem.
 
 I think it would be best to not make the slave interfere with the
 master's operations; that's only going to increase the operational
 complexity of such a solution.
 
 There could be multiple slaves following a master, some serving
 data-warehousing queries, some for load-balancing reads, some others
 just for disaster recovery, and then some just to mitigate human
 errors by re-applying the logs with a delay.

Agreed.

We ruled that out as the-only-solution a while back. It does have the
beauty of simplicity, so it may exist as an option or possibly the only
way, for 8.4.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-30 Thread Andreas 'ads' Scherbaum
On Thu, 29 May 2008 09:22:26 -0700 Steve Atkins wrote:
 On May 29, 2008, at 9:12 AM, David Fetter wrote:
 
  Either one of these would be great, but something that involves
  machines that stay useless most of the time is just not going to work.
 
 I have customers who are thinking about warm standby functionality, and
 the only thing stopping them deploying it is complexity and maintenance,
 not the cost of the HA hardware. If trivial-to-deploy replication that  
 didn't offer read-only access of the slaves were available today I'd bet
 that most of them would be using it.

Sure, have a similar customer. They are right now using a set of
Perl-scripts which ship the logfiles to the slave, take care of the
status, apply the logfiles, validate checksums ect ect. The whole thing
works very well in combination with RedHat cluster software, but it
took several weeks to implement the current solution.

Not everyone wants to spend the time and the manpower to implement a
simple replication.


Kind regards

-- 
Andreas 'ads' Scherbaum
German PostgreSQL User Group

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-30 Thread Andreas 'ads' Scherbaum
On Thu, 29 May 2008 18:29:01 -0400 Tom Lane wrote:

 Dimitri Fontaine [EMAIL PROTECTED] writes:
  While at it, would it be possible for the simple part of the core  
  team statement to include automatic failover?
 
 No, I think it would be a useless expenditure of energy.  Failover
 includes a lot of things that are not within our purview: switching
 IP addresses to point to the new server, some kind of STONITH solution
 to keep the original master from coming back to life, etc.  Moreover
 there are already projects/products concerned with those issues.

True words. Failover is not and should not be part of PostgreSQL.

But PG can help the failover solution, as example: an easy-to-use
interface about the current slave status comes into my mind. Other
ideas might also be possible.


 It might be useful to document where to find solutions to that problem,
 but we can't take it on as part of core Postgres.

Ack


Kind regards

-- 
Andreas 'ads' Scherbaum
German PostgreSQL User Group

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-30 Thread Andreas 'ads' Scherbaum
On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote:

 Well, yes, but you do know about archive_timeout, right? No need to wait 
 2 hours.

Then you ship 16 MB binary stuff every 30 second or every minute but
you only have some kbyte real data in the logfile. This must be taken
into account, especially if you ship the logfile over the internet
(means: no high-speed connection, maybe even pay-per-traffic) to the
slave.


Kind regards

-- 
Andreas 'ads' Scherbaum
German PostgreSQL User Group

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-30 Thread Greg Smith

On Fri, 30 May 2008, Andreas 'ads' Scherbaum wrote:


Then you ship 16 MB binary stuff every 30 second or every minute but
you only have some kbyte real data in the logfile.


Not if you use pg_clearxlogtail ( 
http://www.2ndquadrant.com/replication.htm ), which got lost in the giant 
March commitfest queue but should probably wander into contrib as part of 
8.4.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-30 Thread Gurjeet Singh
On Thu, May 29, 2008 at 7:42 PM, Tom Lane [EMAIL PROTECTED] wrote:

 The big problem
 is that long-running slave-side queries might still need tuples that are
 vacuumable on the master, and so replication of vacuuming actions would
 cause the slave's queries to deliver wrong answers.


Another issue with read-only slaves just popped up in my head.

How do we block the readers on the slave while it is replaying an ALTER
TABLE or similar command that requires Exclusive lock and potentially alters
the table's structure. Or does the WAL replay already takes an x-lock on
such a table?

Best regards,
-- 
[EMAIL PROTECTED]
[EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB http://www.enterprisedb.com

Mail sent from my BlackLaptop device


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-30 Thread Simon Riggs

On Fri, 2008-05-30 at 11:12 -0700, Robert Hodges wrote:
 This is clearly an important use case but it also seems clear that
 the WAL approach is not a general-purpose approach to replication.

I think we cannot make such a statement yet, if ever.

I would note that log-based replication is now the mainstay of
commercial database replication techniques for loosely-coupled groups of
servers. It would seem strange to assume that it should not be good for
us too, simply because we know it to be difficult.

IMHO the project has a pretty good track record of delivering
functionality that looked hard at first glance.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-30 Thread Robert Treat
On Thursday 29 May 2008 22:59:21 Merlin Moncure wrote:
 On Thu, May 29, 2008 at 9:26 PM, Josh Berkus [EMAIL PROTECTED] wrote:
  I fully accept that it may be the case that it doesn't make technical
  sense to tackle them in any order besides sync-read-only slaves because
  of dependencies in the implementation between the two.  If that's the
  case, it would be nice to explicitly spell out what that was to deflect
  criticism of the planned prioritization.
 
  There's a very simple reason to prioritize the synchronous log shipping
  first; NTT may open source their solution and we'll get it a lot sooner
  than the other components.

 That's a good argument.  I just read the NTT document and the stuff
 looks fantastic.  You've convinced me...

It would be a better argument if the NTT guys hadn't said that they estimated 
6 months time before the code would be released, which puts us beyond 8.4. 
Now it is possible that the time frame could be sooner, but unless someone 
already has the patch, this reminds me a little too much of the arguments for 
including windows support in a single release because we already had a work 
port/patch set to go from. 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-30 Thread Andrew Dunstan



Andreas 'ads' Scherbaum wrote:

On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote:

  
Well, yes, but you do know about archive_timeout, right? No need to wait 
2 hours.



Then you ship 16 MB binary stuff every 30 second or every minute but
you only have some kbyte real data in the logfile. This must be taken
into account, especially if you ship the logfile over the internet
(means: no high-speed connection, maybe even pay-per-traffic) to the
slave.



  


Sure there's a price to pay. But that doesn't mean the facility doesn't 
exist. And I rather suspect that most of Josh's customers aren't too 
concerned about traffic charges or affected by such bandwidth 
restrictions. Certainly, none of my clients are, and they aren't in the 
giant class. Shipping a 16Mb file, particularly if compressed, every 
minute or so, is not such a huge problem for a great many commercial 
users, and even many domestic users.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-30 Thread Robert Treat
On Thursday 29 May 2008 20:31:31 Greg Smith wrote:
 On Thu, 29 May 2008, Tom Lane wrote:
  There's no point in having read-only slave queries if you don't have a
  trustworthy method of getting the data to them.

 This is a key statement that highlights the difference in how you're
 thinking about this compared to some other people here.  As far as some
 are concerned, the already working log shipping *is* a trustworthy method
 of getting data to the read-only slaves.  There are plenty of applications
 (web oriented ones in particular) where if you could direct read-only
 queries against a slave, the resulting combination would be a giant
 improvement over the status quo even if that slave was as much as
 archive_timeout behind the master.  That quantity of lag is perfectly fine
 for a lot of the same apps that have read scalability issues.

 If you're someone who falls into that camp, the idea of putting the sync
 replication job before the read-only slave one seems really backwards.


Just looking at it from an overall market perspective, synchronous log 
shipping pretty much only addresses failover needs, where as read-only slaves 
address both failover and scaling issues. (Note I say address, not solve). 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-30 Thread Robert Treat
On Friday 30 May 2008 01:10:20 Tom Lane wrote:
 Greg Smith [EMAIL PROTECTED] writes:
  I fully accept that it may be the case that it doesn't make technical
  sense to tackle them in any order besides sync-read-only slaves because
  of dependencies in the implementation between the two.

 Well, it's certainly not been my intention to suggest that no one should
 start work on read-only-slaves before we finish the other part.  The
 point is that I expect the log shipping issues will be done first
 because they're easier, and it would be pointless to not release that
 feature if we had it.

 But since you mention it: one of the plausible answers for fixing the
 vacuum problem for read-only slaves is to have the slaves push an xmin
 back upstream to the master to prevent premature vacuuming.  The current
 design of pg_standby is utterly incapable of handling that requirement.
 So there might be an implementation dependency there, depending on how
 we want to solve that problem.


Sure, but whose to say that after synchronous wal shipping is finished it 
wont need a serious re-write due to new needs from the hot standby feature. I 
think going either way carries some risk. 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-30 Thread Gurjeet Singh
On Sat, May 31, 2008 at 1:52 AM, Greg Smith [EMAIL PROTECTED] wrote:

 On Fri, 30 May 2008, Andreas 'ads' Scherbaum wrote:

  Then you ship 16 MB binary stuff every 30 second or every minute but
 you only have some kbyte real data in the logfile.


 Not if you use pg_clearxlogtail (
 http://www.2ndquadrant.com/replication.htm ), which got lost in the giant
 March commitfest queue but should probably wander into contrib as part of
 8.4.


This means we need to modify pg_standby to not check for filesize when
reading XLogs.

Best regards,

-- 
[EMAIL PROTECTED]
[EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB http://www.enterprisedb.com

Mail sent from my BlackLaptop device


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-30 Thread Tatsuo Ishii
 Andreas 'ads' Scherbaum wrote:
  On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote:
 

  Well, yes, but you do know about archive_timeout, right? No need to wait 
  2 hours.
  
 
  Then you ship 16 MB binary stuff every 30 second or every minute but
  you only have some kbyte real data in the logfile. This must be taken
  into account, especially if you ship the logfile over the internet
  (means: no high-speed connection, maybe even pay-per-traffic) to the
  slave.
 
 
 

 
 Sure there's a price to pay. But that doesn't mean the facility doesn't 
 exist. And I rather suspect that most of Josh's customers aren't too 
 concerned about traffic charges or affected by such bandwidth 
 restrictions. Certainly, none of my clients are, and they aren't in the 
 giant class. Shipping a 16Mb file, particularly if compressed, every 
 minute or so, is not such a huge problem for a great many commercial 
 users, and even many domestic users.

Sumitomo Electric Co., Ltd., a 20 billion dollars selling company in
Japan (parent company of Sumitomo Electric Information Systems Co.,
Ltd., which is one of the Recursive SQL development support company)
uses 100 PostgreSQL servers. They are doing backups by using log
shipping to another data center and have problems with the amount of
the transferring log data. They said this is one of the big problems
they have with PostgreSQL and hope it will be solved in the near
future.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-30 Thread Joshua D. Drake


On Sat, 2008-05-31 at 02:48 +0530, Gurjeet Singh wrote:

 
 Not if you use pg_clearxlogtail
 ( http://www.2ndquadrant.com/replication.htm ), which got lost
 in the giant March commitfest queue but should probably wander
 into contrib as part of 8.4.
 
 This means we need to modify pg_standby to not check for filesize when
 reading XLogs.
 
 Best regards,
 

It does.

Joshua D. Drake

 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-30 Thread Greg Smith

On Sat, 31 May 2008, Gurjeet Singh wrote:


Not if you use pg_clearxlogtail


This means we need to modify pg_standby to not check for filesize when
reading XLogs.


No, the idea is that you run the segments through pg_clearxlogtail | gzip, 
which then compresses lightly used segments massively because all the 
unused bytes are 0.  File comes out the same size at the other side, but 
you didn't ship a full 16MB if there was only a few KB used.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-30 Thread Simon Riggs

On Fri, 2008-05-30 at 01:10 -0400, Tom Lane wrote:
 Greg Smith [EMAIL PROTECTED] writes:
  I fully accept that it may be the case that it doesn't make technical 
  sense to tackle them in any order besides sync-read-only slaves because 
  of dependencies in the implementation between the two.
 
 Well, it's certainly not been my intention to suggest that no one should
 start work on read-only-slaves before we finish the other part.  The
 point is that I expect the log shipping issues will be done first
 because they're easier, and it would be pointless to not release that
 feature if we had it.

Agreed.

I'm arriving late to a thread that seems to have grown out of all
proportion.

AFAICS streaming WAL and hot standby are completely orthogonal features.
Streaming WAL is easier and if NTT can release their code to open source
we may get this in the Sept commit fest. Hot Standby is harder and it
was my viewpoint at PGCon that we may not have a perfect working version
of this by the end of 8.4. We are very likely to have something working,
but maybe not the whole feature set as we might wish to have. I expect
to be actively working on this soon. I definitely do want to see WAL
streaming going in as early as possible and before end of 8.4, otherwise
code conflicts and other difficulties are likely to push out the 8.4
date and/or Hot Standby.

So as I see it, Tom has only passed on my comments on this, not added or
removed anything. The main part of the announcement was really about
bringing the WAL streaming into core and effectively favouring it over a
range of other projects.

Can we all back off a little on this for now? Various concerns have been
validly expressed, but it will all come good AFAICS.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-30 Thread Dimitri Fontaine
Le vendredi 30 mai 2008, Tom Lane a écrit :
 No, I think it would be a useless expenditure of energy.  Failover
 includes a lot of things that are not within our purview: switching
 IP addresses to point to the new server, some kind of STONITH solution
 to keep the original master from coming back to life, etc.  Moreover
 there are already projects/products concerned with those issues.

Well, I forgot that there's in fact no active plan to put pgbouncer features 
into core at the moment (I'm sure I've read something about this on the lists 
though). If it was the case, the slave could proxy queries to the master, and 
stop proxying but serve them if the master tells it it's dying.
This way, no need to switch IP addresses, the clients just connect as usual 
and get results back and do not have to know whether the host they're qerying 
against is a slave or a master. This level of smartness is into -core.

The STONITH part in case of known failure (fatal) does not seem that hard 
either, as the master at fatal time could write somewhere it's now a slave 
and use this at next startup time (recovery.conf?). If it can't even do that, 
well, we're back to crash situation with no provided automatic failover 
solution. Not handled failure cases obviously will continue to exist.

I'm not asking for all cases managed in -core please, just for some level 
of effort on the topic. Of course, I'm just the one asking questions and 
trying to raise ideas, so I'm perfectly fine with your current answer 
(useless expenditure of energy) even if somewhat disagreeing on the useless 
part of it :)

As for the integrated pgbouncer daemon part, I'm thinking this would allow the 
infamous part 3 of the proposal (read-only slave) to get pretty simple to 
setup when ready: the slave knows who its master is, and as soon as an XID is 
needed the transaction queries are forwarded/proxied to it. Thanks again 
Florian !

 It might be useful to document where to find solutions to that problem,
 but we can't take it on as part of core Postgres.

Even the part when it makes sense (provided it does and I'm not completely off 
tracks here)?

Regards,
-- 
dim


signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-30 Thread Andreas 'ads' Scherbaum
On Fri, 30 May 2008 16:22:41 -0400 (EDT) Greg Smith wrote:

 On Fri, 30 May 2008, Andreas 'ads' Scherbaum wrote:
 
  Then you ship 16 MB binary stuff every 30 second or every minute but
  you only have some kbyte real data in the logfile.
 
 Not if you use pg_clearxlogtail ( 
 http://www.2ndquadrant.com/replication.htm ), which got lost in the giant 
 March commitfest queue but should probably wander into contrib as part of 
 8.4.

Yes, this topic was discussed several times in the past but to
solve this it needs a patch/solution which is integrated into PG
itself, not contrib.


Kind regards

-- 
Andreas 'ads' Scherbaum
German PostgreSQL User Group

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-30 Thread Simon Riggs
On Thu, 2008-05-29 at 10:12 -0400, Tom Lane wrote:
 The Postgres core team met at PGCon to discuss a few issues, the largest
 of which is the need for simple, built-in replication for PostgreSQL.
 Historically the project policy has been to avoid putting replication
 into core PostgreSQL, so as to leave room for development of competing
 solutions, recognizing that there is no one size fits all replication
 solution.  However, it is becoming clear that this policy is hindering
 acceptance of PostgreSQL to too great an extent, compared to the benefit
 it offers to the add-on replication projects.  Users who might consider
 PostgreSQL are choosing other database systems because our existing
 replication options are too complex to install and use for simple cases.
 In practice, simple asynchronous single-master-multiple-slave
 replication covers a respectable fraction of use cases, so we have
 concluded that we should allow such a feature to be included in the core
 project.  We emphasize that this is not meant to prevent continued
 development of add-on replication projects that cover more complex use
 cases.
 
 We believe that the most appropriate base technology for this is
 probably real-time WAL log shipping, as was demoed by NTT OSS at PGCon.
 We hope that such a feature can be completed for 8.4.  Ideally this
 would be coupled with the ability to execute read-only queries on the
 slave servers, but we see technical difficulties that might prevent that
 from being completed before 8.5 or even further out.  (The big problem
 is that long-running slave-side queries might still need tuples that are
 vacuumable on the master, and so replication of vacuuming actions would
 cause the slave's queries to deliver wrong answers.)
 
 Again, this will not replace Slony, pgPool, Continuent, Londiste, or
 other systems for many users, as it will be not be highly scalable nor
 support long-distance replication nor replicating less than an entire
 installation.  But it is time to include a simple, reliable basic
 replication feature in the core system.

I'm in full support of this and commend the work of the NTT team.

The goals and timescales are realistic and setting a timetable in this
way will help planning for many users,

I'm expecting to lead the charge on the Hot Standby project. The problem
mentioned is just one of the issues, though overall I'm now optimistic
about our eventual success in that area. I'm discussing this now with a
couple of sponsors and would welcome serious financial commitments to
this goal. Please contact me off-list if you agree also.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-30 Thread Josh Berkus
Andrew,

 Sure there's a price to pay. But that doesn't mean the facility doesn't
 exist. And I rather suspect that most of Josh's customers aren't too
 concerned about traffic charges or affected by such bandwidth
 restrictions. Certainly, none of my clients are, and they aren't in the
 giant class. Shipping a 16Mb file, particularly if compressed, every
 minute or so, is not such a huge problem for a great many commercial
 users, and even many domestic users.

The issue is that when you're talking about telecommunications companies 
(and similar) once a minute isn't adequate.  Those folks want at least 
every second, or even better synchronous.

Anyway, this is a pretty pointless discussion given that we want both 
capabilities, and stuff will get implemented in the order it makes 
technical sense.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Fwd: Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-30 Thread Csaba Nagy
[Looks like this mail missed the hackers list on reply to all, I wonder
how it could happen... so I forward it]

On Thu, 2008-05-29 at 17:00 +0100, Dave Page wrote:
 Yes, we're talking real-time streaming (synchronous) log shipping.

Is there any design already how would this be implemented ?

Some time ago I was interested in this subject and thought about having
a normal postgres connection where the slave would issue a query to a
special view which would simply stream WAL records as bytea from a
requested point, without ever finishing. This would have the advantage
(against a custom socket streaming solution) that it can reuse the
complete infrastructure of connection making/managing/security (think
SSL for no sniffing) of the postgres server. It would also be a public
interface, which could be used by other possible tools too (think PITR
management application/WAL stream repository). Another advantage would
be that a PITR solution could be serving as a source for the WAL stream
too, so the slave could either get the real time stream from the master,
or rebuild a PITR state from a WAL repository server, using the same
interface...

Probably some kind of WAL subscription management should be also
implemented, so that the slave can signal the master which WAL records
it already applied and can be recycled on the master, and it would be
nice if there could be multiple subscribers at the same time. Some
subscriber time-out could be also implemented, while marking the
subscription as timed out, so that the slave can know that it has to
rebuild itself...

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-30 Thread Andreas 'ads' Scherbaum
On Fri, 30 May 2008 17:05:57 -0400 Andrew Dunstan wrote:
 Andreas 'ads' Scherbaum wrote:
  On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote:
 
  Well, yes, but you do know about archive_timeout, right? No need to wait 
  2 hours.
 
  Then you ship 16 MB binary stuff every 30 second or every minute but
  you only have some kbyte real data in the logfile. This must be taken
  into account, especially if you ship the logfile over the internet
  (means: no high-speed connection, maybe even pay-per-traffic) to the
  slave.
 
 Sure there's a price to pay. But that doesn't mean the facility doesn't 
 exist. And I rather suspect that most of Josh's customers aren't too 
 concerned about traffic charges or affected by such bandwidth 
 restrictions. Certainly, none of my clients are, and they aren't in the 
 giant class. Shipping a 16Mb file, particularly if compressed, every 
 minute or so, is not such a huge problem for a great many commercial 
 users, and even many domestic users.

The real problem is not the 16 MB, the problem is: you can't compress
this file. If the logfile is rotated it still contains all the
old binary data which is not a good starter for compression.

So you may have some kB changes in the wal logfile every minute but you
still copy 16 MB data. Sure, it's not so much - but if you rotate a
logfile every minute this still transfers 16*60*24 = ~23 GB a day.


Kind regards

-- 
Andreas 'ads' Scherbaum
German PostgreSQL User Group

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-30 Thread Andrew Dunstan



Tatsuo Ishii wrote:

Andreas 'ads' Scherbaum wrote:


On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote:

  
  
Well, yes, but you do know about archive_timeout, right? No need to wait 
2 hours.



Then you ship 16 MB binary stuff every 30 second or every minute but
you only have some kbyte real data in the logfile. This must be taken
into account, especially if you ship the logfile over the internet
(means: no high-speed connection, maybe even pay-per-traffic) to the
slave.



  
  
Sure there's a price to pay. But that doesn't mean the facility doesn't 
exist. And I rather suspect that most of Josh's customers aren't too 
concerned about traffic charges or affected by such bandwidth 
restrictions. Certainly, none of my clients are, and they aren't in the 
giant class. Shipping a 16Mb file, particularly if compressed, every 
minute or so, is not such a huge problem for a great many commercial 
users, and even many domestic users.



Sumitomo Electric Co., Ltd., a 20 billion dollars selling company in
Japan (parent company of Sumitomo Electric Information Systems Co.,
Ltd., which is one of the Recursive SQL development support company)
uses 100 PostgreSQL servers. They are doing backups by using log
shipping to another data center and have problems with the amount of
the transferring log data. They said this is one of the big problems
they have with PostgreSQL and hope it will be solved in the near
future.

  


Excellent data point. Now, what I'd like to know is whether they are 
getting into trouble simply because of the volume of log data generated 
or because they have a short archive_timeout set. If it's the former 
(which seems more likely) then none of the ideas I have seen so far in 
this discussion seemed likely to help, and that would indeed be a major 
issue we should look at. Another question is this: are they being 
overwhelmed by the amount of network traffic generated, or by difficulty 
in postgres producers and/or consumers to keep up? If it's network 
traffic, then perhaps compression would help us.


Maybe we need to set some goals for the level of log volumes we expect 
to be able to create/send/comsume.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-30 Thread Gurjeet Singh
On Sat, May 31, 2008 at 3:41 AM, Greg Smith [EMAIL PROTECTED] wrote:

 On Sat, 31 May 2008, Gurjeet Singh wrote:

  Not if you use pg_clearxlogtail


 This means we need to modify pg_standby to not check for filesize when
 reading XLogs.


 No, the idea is that you run the segments through pg_clearxlogtail | gzip,
 which then compresses lightly used segments massively because all the unused
 bytes are 0.  File comes out the same size at the other side, but you didn't
 ship a full 16MB if there was only a few KB used.


Got it. I remember reading about pg_clearxlogtail in these mailing lists;
but somehow forgot how it actually worked!

-- 
[EMAIL PROTECTED]
[EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB http://www.enterprisedb.com

Mail sent from my BlackLaptop device


[HACKERS] Core team statement on replication in PostgreSQL

2008-05-29 Thread Tom Lane
The Postgres core team met at PGCon to discuss a few issues, the largest
of which is the need for simple, built-in replication for PostgreSQL.
Historically the project policy has been to avoid putting replication
into core PostgreSQL, so as to leave room for development of competing
solutions, recognizing that there is no one size fits all replication
solution.  However, it is becoming clear that this policy is hindering
acceptance of PostgreSQL to too great an extent, compared to the benefit
it offers to the add-on replication projects.  Users who might consider
PostgreSQL are choosing other database systems because our existing
replication options are too complex to install and use for simple cases.
In practice, simple asynchronous single-master-multiple-slave
replication covers a respectable fraction of use cases, so we have
concluded that we should allow such a feature to be included in the core
project.  We emphasize that this is not meant to prevent continued
development of add-on replication projects that cover more complex use
cases.

We believe that the most appropriate base technology for this is
probably real-time WAL log shipping, as was demoed by NTT OSS at PGCon.
We hope that such a feature can be completed for 8.4.  Ideally this
would be coupled with the ability to execute read-only queries on the
slave servers, but we see technical difficulties that might prevent that
from being completed before 8.5 or even further out.  (The big problem
is that long-running slave-side queries might still need tuples that are
vacuumable on the master, and so replication of vacuuming actions would
cause the slave's queries to deliver wrong answers.)

Again, this will not replace Slony, pgPool, Continuent, Londiste, or
other systems for many users, as it will be not be highly scalable nor
support long-distance replication nor replicating less than an entire
installation.  But it is time to include a simple, reliable basic
replication feature in the core system.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-29 Thread Marko Kreen
On 5/29/08, Tom Lane [EMAIL PROTECTED] wrote:
 The Postgres core team met at PGCon to discuss a few issues, the largest
  of which is the need for simple, built-in replication for PostgreSQL.
  Historically the project policy has been to avoid putting replication
  into core PostgreSQL, so as to leave room for development of competing
  solutions, recognizing that there is no one size fits all replication
  solution.  However, it is becoming clear that this policy is hindering
  acceptance of PostgreSQL to too great an extent, compared to the benefit
  it offers to the add-on replication projects.  Users who might consider
  PostgreSQL are choosing other database systems because our existing
  replication options are too complex to install and use for simple cases.
  In practice, simple asynchronous single-master-multiple-slave
  replication covers a respectable fraction of use cases, so we have
  concluded that we should allow such a feature to be included in the core
  project.  We emphasize that this is not meant to prevent continued
  development of add-on replication projects that cover more complex use
  cases.

  We believe that the most appropriate base technology for this is
  probably real-time WAL log shipping, as was demoed by NTT OSS at PGCon.
  We hope that such a feature can be completed for 8.4.

+1

Although I would explain it more shortly - we do need a solution for
lossless failover servers and such solution needs to live in core backend.

  Ideally this
  would be coupled with the ability to execute read-only queries on the
  slave servers, but we see technical difficulties that might prevent that
  from being completed before 8.5 or even further out.  (The big problem
  is that long-running slave-side queries might still need tuples that are
  vacuumable on the master, and so replication of vacuuming actions would
  cause the slave's queries to deliver wrong answers.)

Well, both Slony-I and upcoming Skytools 3 have the same problem when
cleaning events and have it solved simply by slaves reporting back their
lowest position on event stream.  I cannot see why it cannot be applied
in this case too.  So each slave just needs to report its own longest
open tx as open to master.  Yes, it bloats master but no way around it.

Only problem could be the plan to vacuum tuples updated in between long
running tx and the regular ones, but such behaviour can be just turned off.

We could also have a option of inaccessible slave, for those who
fear bloat on master.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-29 Thread David Fetter
On Thu, May 29, 2008 at 10:12:55AM -0400, Tom Lane wrote:
 The Postgres core team met at PGCon to discuss a few issues, the
 largest of which is the need for simple, built-in replication for
 PostgreSQL.  Historically the project policy has been to avoid
 putting replication into core PostgreSQL, so as to leave room for
 development of competing solutions, recognizing that there is no
 one size fits all replication solution.  However, it is becoming
 clear that this policy is hindering acceptance of PostgreSQL to too
 great an extent, compared to the benefit it offers to the add-on
 replication projects.  Users who might consider PostgreSQL are
 choosing other database systems because our existing replication
 options are too complex to install and use for simple cases.  In
 practice, simple asynchronous single-master-multiple-slave
 replication covers a respectable fraction of use cases, so we have
 concluded that we should allow such a feature to be included in the
 core project.  We emphasize that this is not meant to prevent
 continued development of add-on replication projects that cover more
 complex use cases.
 
 We believe that the most appropriate base technology for this is
 probably real-time WAL log shipping, as was demoed by NTT OSS at
 PGCon.  We hope that such a feature can be completed for 8.4.

 Ideally this would be coupled with the ability to execute read-only
 queries on the slave servers, but we see technical difficulties that
 might prevent that from being completed before 8.5 or even further
 out.  (The big problem is that long-running slave-side queries might
 still need tuples that are vacuumable on the master, and so
 replication of vacuuming actions would cause the slave's queries to
 deliver wrong answers.)

This part is a deal-killer.  It's a giant up-hill slog to sell warm
standby to those in charge of making resources available because the
warm standby machine consumes SA time, bandwidth, power, rack space,
etc., but provides no tangible benefit, and this feature would have
exactly the same problem.

IMHO, without the ability to do read-only queries on slaves, it's not
worth doing this feature at all.

Cheers,
David.
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: [EMAIL PROTECTED]

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-29 Thread Joshua D. Drake


On Thu, 2008-05-29 at 08:21 -0700, David Fetter wrote:
 On Thu, May 29, 2008 at 10:12:55AM -0400, Tom Lane wrote:

 This part is a deal-killer.  It's a giant up-hill slog to sell warm
 standby to those in charge of making resources available because the
 warm standby machine consumes SA time, bandwidth, power, rack space,
 etc., but provides no tangible benefit, and this feature would have
 exactly the same problem.
 
 IMHO, without the ability to do read-only queries on slaves, it's not
 worth doing this feature at all.

The only question I have is... what does this give us that PITR doesn't
give us?

Sincerely,

Joshua D. Drake



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-29 Thread Josh Berkus

Marko,


But Tom's mail gave me impression core wants to wait until we get perfect
read-only slave implementation so we wait with it until 8.6, which does
not seem sensible.  If we can do slightly inefficient (but simple)
implementation
right now, I see no reason to reject it, we can always improve it later.


That's incorrect.  We're looking for a workable solution.  If we could 
get one for 8.4, that would be brilliant but we think it's going to be 
harder than that.


Publishing the XIDs back to the master is one possibility.  We also 
looked at using spillover segments for vacuumed rows, but that seemed 
even less viable.


I'm also thinking, for *async replication*, that we could simply halt 
replication on the slave whenever a transaction passes minxid on the 
master.  However, the main focus will be on synchrounous hot standby.


--Josh


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-29 Thread David Fetter
On Thu, May 29, 2008 at 08:46:22AM -0700, Joshua D. Drake wrote:
 On Thu, 2008-05-29 at 08:21 -0700, David Fetter wrote:
  This part is a deal-killer.  It's a giant up-hill slog to sell
  warm standby to those in charge of making resources available
  because the warm standby machine consumes SA time, bandwidth,
  power, rack space, etc., but provides no tangible benefit, and
  this feature would have exactly the same problem.
  
  IMHO, without the ability to do read-only queries on slaves, it's
  not worth doing this feature at all.
 
 The only question I have is... what does this give us that PITR
 doesn't give us?

It looks like a wrapper for PITR to me, so the gain would be ease of
use.

Cheers,
David.
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: [EMAIL PROTECTED]

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-29 Thread Douglas McNaught
On Thu, May 29, 2008 at 11:46 AM, Joshua D. Drake [EMAIL PROTECTED] wrote:

 The only question I have is... what does this give us that PITR doesn't
 give us?

I think the idea is that WAL records would be shipped (possibly via
socket) and applied as they're generated, rather than on a
file-by-file basis.  At least that's what real-time implies to me...

-Doug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-29 Thread Bruce Momjian
Josh Berkus wrote:
 Marko,
 
  But Tom's mail gave me impression core wants to wait until we get perfect
  read-only slave implementation so we wait with it until 8.6, which does
  not seem sensible.  If we can do slightly inefficient (but simple)
  implementation
  right now, I see no reason to reject it, we can always improve it later.
 
 That's incorrect.  We're looking for a workable solution.  If we could 
 get one for 8.4, that would be brilliant but we think it's going to be 
 harder than that.
 
 Publishing the XIDs back to the master is one possibility.  We also 
 looked at using spillover segments for vacuumed rows, but that seemed 
 even less viable.
 
 I'm also thinking, for *async replication*, that we could simply halt 
 replication on the slave whenever a transaction passes minxid on the 
 master.  However, the main focus will be on synchrounous hot standby.

Another idea I discussed with Tom is having the slave _delay_ applying
WAL files until all slave snapshots are ready.

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-29 Thread Dave Page
On Thu, May 29, 2008 at 4:48 PM, Douglas McNaught [EMAIL PROTECTED] wrote:
 On Thu, May 29, 2008 at 11:46 AM, Joshua D. Drake [EMAIL PROTECTED] wrote:

 The only question I have is... what does this give us that PITR doesn't
 give us?

 I think the idea is that WAL records would be shipped (possibly via
 socket) and applied as they're generated, rather than on a
 file-by-file basis.  At least that's what real-time implies to me...

Yes, we're talking real-time streaming (synchronous) log shipping.

-- 
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-29 Thread Rick Vernam
On Thursday 29 May 2008 09:54:03 am Marko Kreen wrote:
 On 5/29/08, Tom Lane [EMAIL PROTECTED] wrote:
  The Postgres core team met at PGCon to discuss a few issues, the largest
   of which is the need for simple, built-in replication for PostgreSQL.
   Historically the project policy has been to avoid putting replication
   into core PostgreSQL, so as to leave room for development of competing
   solutions, recognizing that there is no one size fits all replication
   solution.  However, it is becoming clear that this policy is hindering
   acceptance of PostgreSQL to too great an extent, compared to the benefit
   it offers to the add-on replication projects.  Users who might consider
   PostgreSQL are choosing other database systems because our existing
   replication options are too complex to install and use for simple cases.
   In practice, simple asynchronous single-master-multiple-slave
   replication covers a respectable fraction of use cases, so we have
   concluded that we should allow such a feature to be included in the core
   project.  We emphasize that this is not meant to prevent continued
   development of add-on replication projects that cover more complex use
   cases.
 
   We believe that the most appropriate base technology for this is
   probably real-time WAL log shipping, as was demoed by NTT OSS at PGCon.
   We hope that such a feature can be completed for 8.4.

 +1

 Although I would explain it more shortly - we do need a solution for
 lossless failover servers and such solution needs to live in core backend.

+1 for lossless failover (ie, synchronous)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-29 Thread Marko Kreen
On 5/29/08, David Fetter [EMAIL PROTECTED] wrote:
 On Thu, May 29, 2008 at 10:12:55AM -0400, Tom Lane wrote:
   Ideally this would be coupled with the ability to execute read-only
   queries on the slave servers, but we see technical difficulties that
   might prevent that from being completed before 8.5 or even further
   out.  (The big problem is that long-running slave-side queries might
   still need tuples that are vacuumable on the master, and so
   replication of vacuuming actions would cause the slave's queries to
   deliver wrong answers.)

 This part is a deal-killer.  It's a giant up-hill slog to sell warm
  standby to those in charge of making resources available because the
  warm standby machine consumes SA time, bandwidth, power, rack space,
  etc., but provides no tangible benefit, and this feature would have
  exactly the same problem.

  IMHO, without the ability to do read-only queries on slaves, it's not
  worth doing this feature at all.

I would not be so harsh - I'd like to have the lossless standby even
without read-only slaves.

But Tom's mail gave me impression core wants to wait until we get perfect
read-only slave implementation so we wait with it until 8.6, which does
not seem sensible.  If we can do slightly inefficient (but simple)
implementation
right now, I see no reason to reject it, we can always improve it later.

Especially as it can be switchable.  And we could also have
transaction_timeout paramenter on slaves so the hit on master is limited.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-29 Thread Aidan Van Dyk
* Josh Berkus [EMAIL PROTECTED] [080529 11:52]:
 Marko,
 
 But Tom's mail gave me impression core wants to wait until we get perfect
 read-only slave implementation so we wait with it until 8.6, which does
 not seem sensible.  If we can do slightly inefficient (but simple)
 implementation
 right now, I see no reason to reject it, we can always improve it later.
 
 That's incorrect.  We're looking for a workable solution.  If we could 
 get one for 8.4, that would be brilliant but we think it's going to be 
 harder than that.
 
 Publishing the XIDs back to the master is one possibility.  We also 
 looked at using spillover segments for vacuumed rows, but that seemed 
 even less viable.
 
 I'm also thinking, for *async replication*, that we could simply halt 
 replication on the slave whenever a transaction passes minxid on the 
 master.  However, the main focus will be on synchrounous hot standby.

Or, instead of statement timeout killing statements on the RO slave,
simply kill any old transactions on the RO slave.   Old in the sense
that the master's xmin has passed it.  And it's just an exersise in
controlling the age of xmin on the master, which could even be done
user-side.

Doesn't fit all, but no one size does...  It would work for where you're
hammering your slaves with a diverse set of high-velocity short queries
that you're trying to avoid on the master...

An option to pause reply (making it async)  or abort transactions
(for sync) might make it possible to easily run an async slave for slow
reporting queries, and a sync slave for short queries.

a.

-- 
Aidan Van Dyk Create like a god,
[EMAIL PROTECTED]   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [HACKERS] Core team statement on replication in PostgreSQL

2008-05-29 Thread Josh Berkus

Joshua D. Drake wrote:


On Thu, 2008-05-29 at 08:21 -0700, David Fetter wrote:

On Thu, May 29, 2008 at 10:12:55AM -0400, Tom Lane wrote:



This part is a deal-killer.  It's a giant up-hill slog to sell warm
standby to those in charge of making resources available because the
warm standby machine consumes SA time, bandwidth, power, rack space,
etc., but provides no tangible benefit, and this feature would have
exactly the same problem.

IMHO, without the ability to do read-only queries on slaves, it's not
worth doing this feature at all.


The only question I have is... what does this give us that PITR doesn't
give us?


Since people seem to be unclear on what we're proposing:

8.4 Synchronous Warm Standby: makes PostgreSQL more suitable for HA 
systems by eliminating failover data loss and cutting failover time.


8.5 (probably) Synchronous  Asynchronous Hot Standby: adds read-only 
queries on slaves to the above.


Again, if we can implement queries on slaves for 8.4, we're all for it. 
 However, after conversations in Core and with Simon we all think it's 
going to be too big a task to complete in 4-5 months.  We *don't* want 
to end up delaying 8.4 for 5 months because we're debugging hot standby.


--Josh


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >