Re: [HACKERS] logical changeset generation v6

2014-04-24 Thread Magnus Hagander
On Mon, Sep 23, 2013 at 7:03 PM, Peter Geoghegan p...@heroku.com wrote:

 On Mon, Sep 23, 2013 at 9:54 AM, Andres Freund and...@2ndquadrant.com
 wrote:
  I still find it wierd/inconsistent to have:
  * pg_receivexlog
  * pg_recvlogical
  binaries, even from the same source directory. Why once pg_recv and
  once pg_receive?

 +1


Digging up a really old thread since I just got annoyed by the inconsistent
naming the first time myself :)

I can't find that this discussion actually came to a proper consensus, but
I may be missing something. Did we go with pg_recvlogical just because we
couldn't decide on a better name, or did we intentionally decide it was the
best?

I definitely think pg_receivelogical would be a better name, for
consistency (because it's way too late to rename pg_receivexlog of course -
once released that can't really chance. Which is why *if* we want to change
the name of pg_recvxlog we have a few more days to make a decision..)


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] logical changeset generation v6

2014-04-24 Thread Andres Freund
On 2014-04-24 09:39:21 +0200, Magnus Hagander wrote:
 I can't find that this discussion actually came to a proper consensus, but
 I may be missing something. Did we go with pg_recvlogical just because we
 couldn't decide on a better name, or did we intentionally decide it was the
 best?

I went with pg_recvlogical because that's where the (small) majority
seemed to be. Even if I was unconvinced. There were so many outstanding
big fights at that point that I didn't want to spend my time on this ;)

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6

2014-04-24 Thread Magnus Hagander
On Thu, Apr 24, 2014 at 9:43 AM, Andres Freund and...@2ndquadrant.comwrote:

 On 2014-04-24 09:39:21 +0200, Magnus Hagander wrote:
  I can't find that this discussion actually came to a proper consensus,
 but
  I may be missing something. Did we go with pg_recvlogical just because we
  couldn't decide on a better name, or did we intentionally decide it was
 the
  best?

 I went with pg_recvlogical because that's where the (small) majority
 seemed to be. Even if I was unconvinced. There were so many outstanding
 big fights at that point that I didn't want to spend my time on this ;)


I was guessing something like the second part there, which is why I figured
this would be a good time to bring this fight back up to the surface ;)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] logical changeset generation v6

2014-04-24 Thread Andres Freund
On 2014-04-24 09:46:07 +0200, Magnus Hagander wrote:
 On Thu, Apr 24, 2014 at 9:43 AM, Andres Freund and...@2ndquadrant.comwrote:
 
  On 2014-04-24 09:39:21 +0200, Magnus Hagander wrote:
   I can't find that this discussion actually came to a proper consensus,
  but
   I may be missing something. Did we go with pg_recvlogical just because we
   couldn't decide on a better name, or did we intentionally decide it was
  the
   best?
 
  I went with pg_recvlogical because that's where the (small) majority
  seemed to be. Even if I was unconvinced. There were so many outstanding
  big fights at that point that I didn't want to spend my time on this ;)
 
 
 I was guessing something like the second part there, which is why I figured
 this would be a good time to bring this fight back up to the surface ;)

I have to admit that I still don't care too much. By now I'd tentatively
want to stay with the current name because that's what I got used to,
but if somebody else has strong opinions and finds concensus...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6

2013-09-28 Thread Steve Singer

On 09/27/2013 05:18 PM, Andres Freund wrote:

Hi Steve,

On 2013-09-27 17:06:59 -0400, Steve Singer wrote:

I've determined that when in this test the walsender seems to be hitting
this when it is decode the transactions that are behind the slonik
commands to add tables to replication (set add table, set add sequence).
This is before the SUBSCRIBE SET is submitted.

I've also noticed something else that is strange (but might be unrelated).
If I stop my slon process and restart it I get messages like:

WARNING:  Starting logical replication from 0/a9321360
ERROR:  cannot stream from 0/A9321360, minimum is 0/A9320B00

Where 0/A9321360 was sent in the last packet my slon received from the
walsender before the restart.

Uh, that looks like I fumbled some comparison. Let me check.


I've further narrowed this down to something (or the combination of) what
the  _disorder_replica.altertableaddTriggers(1);
stored function does.  (or @SLONYNAMESPACE@.altertableaddTriggers(int);

Which is essentially
* Get an exclusive lock on sl_config_lock
* Get an exclusive lock on the user table in question
* create a trigger (the deny access trigger)
* create a truncate trigger
* create a deny truncate trigger

I am not yet able to replicate the error by issuing the same SQL commands
from psql, but I must be missing something.

I can replicate this when just using the test_decoding plugin.

Thanks. That should get me started with debugging. Unless it's possibly
fixed in the latest version, one bug fixed there might cause something
like this if the moon stands exactly right?


The latest version has NOT fixed the problem.

Also, I was a bit inaccurate in my previous descriptions. To clarify:

1.   I sometimes am getting that 'unexpected duplicate' error
2.   The 'set add table ' which triggers those functions that create and 
configure triggers is actually causing the walsender to hit the 
following assertion

2  0x00773d47 in ExceptionalCondition (
conditionName=conditionName@entry=0x8cf400 !(ent-cmin == 
change-tuplecid.cmin), errorType=errorType@entry=0x7ab830 
FailedAssertion,

fileName=fileName@entry=0x8cecc3 reorderbuffer.c,
lineNumber=lineNumber@entry=1162) at assert.c:54
#3  0x00665480 in ReorderBufferBuildTupleCidHash (txn=0x1b6e610,
rb=optimized out) at reorderbuffer.c:1162
#4  ReorderBufferCommit (rb=0x1b6e4f8, xid=optimized out,
commit_lsn=3461001952, end_lsn=optimized out) at reorderbuffer.c:1285
#5  0x0065f0f7 in DecodeCommit (xid=optimized out,
nsubxacts=optimized out, sub_xids=optimized out, ninval_msgs=16,
msgs=0x1b637c0, buf=0x7fff54d01530, buf=0x7fff54d01530, ctx=0x1adb928,
ctx=0x1adb928) at decode.c:477


I had added an assert(false) to the code where the 'unknown duplicate' 
error was logged to make spotting this easier but yesterday I didn't 
double check that I was hitting the assertion I added versus this other 
one.   I can't yet say if this is two unrelated issues or if I'd get to 
the 'unknown duplicate' message immediately after.






Greetings,

Andres Freund





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6

2013-09-27 Thread Steve Singer

On 09/26/2013 02:47 PM, Steve Singer wrote:



I've determined that when in this test the walsender seems to be 
hitting this when it is decode the transactions that are behind the 
slonik commands to add tables to replication (set add table, set add 
sequence).  This is before the SUBSCRIBE SET is submitted.


I've also noticed something else that is strange (but might be 
unrelated).  If I stop my slon process and restart it I get messages 
like:


WARNING:  Starting logical replication from 0/a9321360
ERROR:  cannot stream from 0/A9321360, minimum is 0/A9320B00

Where 0/A9321360 was sent in the last packet my slon received from the 
walsender before the restart.


If force it to restart replication from 0/A9320B00 I see datarows that 
I appear to have already seen before the restart.
I think this is happening when I process the data for 0/A9320B00 but 
don't get the feedback message my slon was killed. Is this expected?





I've further narrowed this down to something (or the combination of) 
what the  _disorder_replica.altertableaddTriggers(1);

stored function does.  (or @SLONYNAMESPACE@.altertableaddTriggers(int);

Which is essentially
* Get an exclusive lock on sl_config_lock
* Get an exclusive lock on the user table in question
* create a trigger (the deny access trigger)
* create a truncate trigger
* create a deny truncate trigger

I am not yet able to replicate the error by issuing the same SQL 
commands from psql, but I must be missing something.


I can replicate this when just using the test_decoding plugin.











Greetings,

Andres Freund













--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6

2013-09-27 Thread Andres Freund
Hi Steve,

On 2013-09-27 17:06:59 -0400, Steve Singer wrote:
 I've determined that when in this test the walsender seems to be hitting
 this when it is decode the transactions that are behind the slonik
 commands to add tables to replication (set add table, set add sequence).
 This is before the SUBSCRIBE SET is submitted.
 
 I've also noticed something else that is strange (but might be unrelated).
 If I stop my slon process and restart it I get messages like:
 
 WARNING:  Starting logical replication from 0/a9321360
 ERROR:  cannot stream from 0/A9321360, minimum is 0/A9320B00
 
 Where 0/A9321360 was sent in the last packet my slon received from the
 walsender before the restart.

Uh, that looks like I fumbled some comparison. Let me check.

 I've further narrowed this down to something (or the combination of) what
 the  _disorder_replica.altertableaddTriggers(1);
 stored function does.  (or @SLONYNAMESPACE@.altertableaddTriggers(int);
 
 Which is essentially
 * Get an exclusive lock on sl_config_lock
 * Get an exclusive lock on the user table in question
 * create a trigger (the deny access trigger)
 * create a truncate trigger
 * create a deny truncate trigger
 
 I am not yet able to replicate the error by issuing the same SQL commands
 from psql, but I must be missing something.
 
 I can replicate this when just using the test_decoding plugin.

Thanks. That should get me started with debugging. Unless it's possibly
fixed in the latest version, one bug fixed there might cause something
like this if the moon stands exactly right?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6

2013-09-26 Thread Steve Singer

On 09/25/2013 01:20 PM, Steve Singer wrote:

On 09/25/2013 11:08 AM, Andres Freund wrote:

On 2013-09-25 11:01:44 -0400, Steve Singer wrote:

On 09/17/2013 10:31 AM, Andres Freund wrote:
This patch set now fails to apply because of the commit Rename 
various

freeze multixact variables.
And I am even partially guilty for that patch...

Rebased patches attached.
While testing the logical replication changes against my WIP logical 
slony I

am sometimes getting error messages from the WAL sender of the form:
unexpected duplicate for tablespace  X relfilenode  X

Any chance you could provide a setup to reproduce the error?



The steps to build a setup that should reproduce this error are:

1.  I had apply the attached patch on top of your logical replication 
branch so my pg_decode_init  would now if it was being called as part 
of a INIT_REPLICATION or START_REPLICATION.
Unless I have misunderstood something you probably will want to merge 
this fix in


2.  Get my WIP for adding logical support to slony from: 
g...@github.com:ssinger/slony1-engine.git branch logical_repl 
(4af1917f8418a)
(My code changes to slony are more prototype level code quality than 
production code quality)


3.
cd slony1-engine
./configure --with-pgconfigdir=/usr/local/pg94wal/bin (or whatever)
make
make install

4.   Grab the clustertest framework JAR from 
https://github.com/clustertest/clustertest-framework and build up a 
clustertest jar file


5.  Create a file
slony1-engine/clustertest/conf/java.conf
that contains the path to the above JAR file  as a shell variable 
assignment: ie
CLUSTERTESTJAR=/home/ssinger/src/clustertest/clustertest_git/build/jar/clustertest-coordinator.jar 



6.
cp clustertest/conf/disorder.properties.sample 
clustertest/conf/disorder.properties



edit disorder.properites to have the proper values for your 
environment.  All 6 databases can point at the same postgres instance, 
this test will only actually use 2 of them(so far).


7. Run the test
cd clustertest
./run_all_disorder_tests.sh

This involves having the slon connect to the walsender on the database 
test1 and replicate the data into test2 (which is a different database 
on the same postmaster)


If this setup seems like too much effort I can request one of the 
commitfest VM's from Josh and get everything setup there for you.


Steve


Any ideas?

I'll look into it. Could you provide any context to what youre doing
that's being decoded?




I've determined that when in this test the walsender seems to be hitting 
this when it is decode the transactions that are behind the slonik 
commands to add tables to replication (set add table, set add 
sequence).  This is before the SUBSCRIBE SET is submitted.


I've also noticed something else that is strange (but might be 
unrelated).  If I stop my slon process and restart it I get messages like:


WARNING:  Starting logical replication from 0/a9321360
ERROR:  cannot stream from 0/A9321360, minimum is 0/A9320B00

Where 0/A9321360 was sent in the last packet my slon received from the 
walsender before the restart.


If force it to restart replication from 0/A9320B00 I see datarows that I 
appear to have already seen before the restart.
I think this is happening when I process the data for 0/A9320B00 but 
don't get the feedback message my slon was killed. Is this expected?





Greetings,

Andres Freund









--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6

2013-09-25 Thread Steve Singer

On 09/17/2013 10:31 AM, Andres Freund wrote:

This patch set now fails to apply because of the commit Rename various
freeze multixact variables.
And I am even partially guilty for that patch...

Rebased patches attached.


While testing the logical replication changes against my WIP logical 
slony I am sometimes getting error messages from the WAL sender of the form:

unexpected duplicate for tablespace  X relfilenode  X

The call stack is

HeapTupleSatisfiesMVCCDuringDecoding
heap_hot_search_buffer
index_fetch_heap
index_getnext
systable_getnext
RelidByRelfilenode
ReorderBufferCommit
 DecodeCommit
.
.
.


I am working off something based on your version 
e0acfeace6d695c229efd5d78041a1b734583431



Any ideas?


Greetings,

Andres Freund







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6

2013-09-25 Thread Andres Freund
On 2013-09-25 11:01:44 -0400, Steve Singer wrote:
 On 09/17/2013 10:31 AM, Andres Freund wrote:
 This patch set now fails to apply because of the commit Rename various
 freeze multixact variables.
 And I am even partially guilty for that patch...
 
 Rebased patches attached.
 
 While testing the logical replication changes against my WIP logical slony I
 am sometimes getting error messages from the WAL sender of the form:
 unexpected duplicate for tablespace  X relfilenode  X

Any chance you could provide a setup to reproduce the error?

 Any ideas?

I'll look into it. Could you provide any context to what youre doing
that's being decoded?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6

2013-09-25 Thread Steve Singer

On 09/25/2013 11:08 AM, Andres Freund wrote:

On 2013-09-25 11:01:44 -0400, Steve Singer wrote:

On 09/17/2013 10:31 AM, Andres Freund wrote:

This patch set now fails to apply because of the commit Rename various
freeze multixact variables.
And I am even partially guilty for that patch...

Rebased patches attached.

While testing the logical replication changes against my WIP logical slony I
am sometimes getting error messages from the WAL sender of the form:
unexpected duplicate for tablespace  X relfilenode  X

Any chance you could provide a setup to reproduce the error?



The steps to build a setup that should reproduce this error are:

1.  I had apply the attached patch on top of your logical replication 
branch so my pg_decode_init  would now if it was being called as part of 
a INIT_REPLICATION or START_REPLICATION.
Unless I have misunderstood something you probably will want to merge 
this fix in


2.  Get my WIP for adding logical support to slony from: 
g...@github.com:ssinger/slony1-engine.git branch logical_repl  
(4af1917f8418a)
(My code changes to slony are more prototype level code quality than 
production code quality)


3.
cd slony1-engine
./configure --with-pgconfigdir=/usr/local/pg94wal/bin (or whatever)
make
make install

4.   Grab the clustertest framework JAR from 
https://github.com/clustertest/clustertest-framework and build up a 
clustertest jar file


5.  Create a file
slony1-engine/clustertest/conf/java.conf
that contains the path to the above JAR file  as a shell variable 
assignment: ie

CLUSTERTESTJAR=/home/ssinger/src/clustertest/clustertest_git/build/jar/clustertest-coordinator.jar

6.
cp clustertest/conf/disorder.properties.sample 
clustertest/conf/disorder.properties



edit disorder.properites to have the proper values for your 
environment.  All 6 databases can point at the same postgres instance, 
this test will only actually use 2 of them(so far).


7. Run the test
cd clustertest
./run_all_disorder_tests.sh

This involves having the slon connect to the walsender on the database 
test1 and replicate the data into test2 (which is a different database 
on the same postmaster)


If this setup seems like too much effort I can request one of the 
commitfest VM's from Josh and get everything setup there for you.


Steve


Any ideas?

I'll look into it. Could you provide any context to what youre doing
that's being decoded?

Greetings,

Andres Freund



From 9efae980c357d3b75c6d98204ed75b8d29ed6385 Mon Sep 17 00:00:00 2001
From: Steve Singer ssin...@ca.afilias.info
Date: Mon, 16 Sep 2013 10:21:39 -0400
Subject: [PATCH] set the init parameter to true when performing an INIT
 REPLICATION

---
 src/backend/replication/walsender.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 2187d96..1b8f289 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -743,7 +743,7 @@ InitLogicalReplication(InitLogicalReplicationCmd *cmd)
 	sendTimeLine = ThisTimeLineID;
 
 	initStringInfo(output_message);
-	ctx = CreateLogicalDecodingContext(MyLogicalDecodingSlot, false, InvalidXLogRecPtr,
+	ctx = CreateLogicalDecodingContext(MyLogicalDecodingSlot, true, InvalidXLogRecPtr,
 	   NIL,	replay_read_page,
 	   WalSndPrepareWrite, WalSndWriteData);
 
-- 
1.7.10.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6

2013-09-24 Thread Andres Freund
On 2013-09-23 23:12:53 -0400, Robert Haas wrote:
 What exactly is the purpose of this tool?  My impression is that the
 output of logical replication is a series of function calls to a
 logical replication plugin, but does that plugin necessarily have to
 produce an output format that gets streamed to a client via a tool
 like this?

There needs to be a client acking the reception of the data in some
form. There's currently two output methods, SQL and walstreamer, but
there easily could be further, it's basically two functions you have
write.

There are several reasons I think the tool is useful, starting with the
fact that it makes the initial use of the feature easier. Writing a
client for CopyBoth messages wrapping 'w' style binary messages, with the
correct select() loop isn't exactly trivial. I also think it's actually
useful in real scenarios where you want to ship the data to a
remote system for auditing purposes.

 For example, for replication, I'd think you might want the
 plugin to connect to a remote database and directly shove the data in;

That sounds like a bad idea to me. If you pull the data from the remote
side, you get the data in a streaming fashion and the latency sensitive
part of issuing statements to your local database is done locally.
Doing things synchronously like that also makes it way harder to use
synchronous_commit = off on the remote side, which is a tremendous
efficiency win.

If somebody needs something like this, e.g. because they want to
replicate into hundreds of shards depending on some key or such, the
question I don't know is how to actually initiate the
streaming. Somebody would need to start the logical decoding.

 for materialized views, we might like to push the changes into delta
 relations within the source database.

Yes, that's not a bad usecase and I think the only thing missing to use
output plugins that way is a convenient function to tell up to where
data has been received (aka synced to disk, aka applied).

  In either case, there's no
 particular need for any sort of client at all, and in fact it would be
 much better if none were required.  The existence of a tool like
 pg_receivellog seems to presuppose that the goal is spit out logical
 change records as text, but I'm not sure that's actually going to be a
 very common thing to want to do...

It doesn't really rely on anything being text - I've used it with a
binary plugin without problems. Obviously you might not want to use -f -
but an actual file instead...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6

2013-09-24 Thread Robert Haas
On Tue, Sep 24, 2013 at 4:15 AM, Andres Freund and...@2ndquadrant.com wrote:
 There needs to be a client acking the reception of the data in some
 form. There's currently two output methods, SQL and walstreamer, but
 there easily could be further, it's basically two functions you have
 write.

 There are several reasons I think the tool is useful, starting with the
 fact that it makes the initial use of the feature easier. Writing a
 client for CopyBoth messages wrapping 'w' style binary messages, with the
 correct select() loop isn't exactly trivial. I also think it's actually
 useful in real scenarios where you want to ship the data to a
 remote system for auditing purposes.

I have two basic points here:

- Requiring a client is a short-sighted design.  There's no reason we
shouldn't *support* having a client, but IMHO it shouldn't be the only
way to use the feature.

- Suppose that you use pg_receivellog (or whatever we decide to call
it) to suck down logical replication messages.  What exactly are you
going to do with that data once you've got it?  In the case of
pg_receivexlog it's quite obvious what you will do with the received
files: you'll store them in archive of some kind and maybe eventually
use them for archive recovery, streaming replication, or PITR.  But
the answer here is a lot less obvious, at least to me.

 For example, for replication, I'd think you might want the
 plugin to connect to a remote database and directly shove the data in;

 That sounds like a bad idea to me. If you pull the data from the remote
 side, you get the data in a streaming fashion and the latency sensitive
 part of issuing statements to your local database is done locally.
 Doing things synchronously like that also makes it way harder to use
 synchronous_commit = off on the remote side, which is a tremendous
 efficiency win.

This sounds like the voice of experience talking, so I won't argue too
much, but I don't think it's central to my point.  And anyhow, even if
it is a bad idea, that doesn't mean someone won't want to do it.  :-)

 If somebody needs something like this, e.g. because they want to
 replicate into hundreds of shards depending on some key or such, the
 question I don't know is how to actually initiate the
 streaming. Somebody would need to start the logical decoding.

Sounds like a job for a background worker.  It would be pretty swell
if you could write a background worker that connects to a logical
replication slot and then does whatever.

 for materialized views, we might like to push the changes into delta
 relations within the source database.

 Yes, that's not a bad usecase and I think the only thing missing to use
 output plugins that way is a convenient function to tell up to where
 data has been received (aka synced to disk, aka applied).

Yes.  It feels to me (and I only work here) like the job of the output
plugin ought to be to put the data somewhere, and the replication code
shouldn't make too many assumptions about where it's actually going.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6

2013-09-24 Thread Andres Freund
On 2013-09-24 11:04:06 -0400, Robert Haas wrote:
 - Requiring a client is a short-sighted design.  There's no reason we
 shouldn't *support* having a client, but IMHO it shouldn't be the only
 way to use the feature.

There really aren't many limitations preventing you from doing anything
else.

 - Suppose that you use pg_receivellog (or whatever we decide to call
 it) to suck down logical replication messages.  What exactly are you
 going to do with that data once you've got it?  In the case of
 pg_receivexlog it's quite obvious what you will do with the received
 files: you'll store them in archive of some kind and maybe eventually
 use them for archive recovery, streaming replication, or PITR.  But
 the answer here is a lot less obvious, at least to me.

Well, it's not like it's going to be the only client. But it's a useful
one. I don't see this as an argument against pg_receivelogical? Most
sites don't use pg_receivexlog either.
Not having a consumer of the walsender interface included sounds like a
bad idea to me, even if it were only useful for testing. Now, you could
argue it should be in /contrib - and I wouldn't argue against that
except it shares code with the rest of src/bin/pg_basebackup.

  If somebody needs something like this, e.g. because they want to
  replicate into hundreds of shards depending on some key or such, the
  question I don't know is how to actually initiate the
  streaming. Somebody would need to start the logical decoding.

 Sounds like a job for a background worker.  It would be pretty swell
 if you could write a background worker that connects to a logical
 replication slot and then does whatever.

That's already possible. In that case you don't have to connect to a
walsender, although doing so would give you some parallelism, one
decoding the data, the other processing it ;).

There's one usecase I do not forsee decoupling from the walsender
interface this release though - synchronous logical replication. There
currently is no code changes required to make sync rep work for this,
and decoupling sync rep from walsender is too much to bite off in one
go.

  for materialized views, we might like to push the changes into delta
  relations within the source database.
 
  Yes, that's not a bad usecase and I think the only thing missing to use
  output plugins that way is a convenient function to tell up to where
  data has been received (aka synced to disk, aka applied).
 
 Yes.  It feels to me (and I only work here) like the job of the output
 plugin ought to be to put the data somewhere, and the replication code
 shouldn't make too many assumptions about where it's actually going.

The output plugin just has two functions it calls to send out data,
'prepare_write' and 'write'. The callsite has to provide those
callbacks. Two are included. walsender and an SQL SRF.

Check the 'test_logical_decoding commit, it includes the SQL consumer.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6

2013-09-24 Thread Steve Singer

On 09/24/2013 11:21 AM, Andres Freund wrote:

Not having a consumer of the walsender interface included sounds like a
bad idea to me, even if it were only useful for testing. Now, you could
argue it should be in /contrib - and I wouldn't argue against that
except it shares code with the rest of src/bin/pg_basebackup.


+1 on pg_receivellog (or whatever better name we pick) being somewhere.
I found the pg_receivellog code very useful as an example and for 
debugging/development purposes.
It isn't something that I see useful for the average user and I think 
the use-cases it meets are closer to other things we usually put in /contrib


Steve



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6

2013-09-23 Thread Andres Freund
On 2013-09-20 14:15:23 -0700, Peter Geoghegan wrote:
 I have a little bit of feedback that I forgot to mention in my earlier
 reviews, because I thought it was too trivial then: something about
 the name pg_receivellog annoys me in a way that the name
 pg_receivexlog does not. Specifically, it looks like someone meant to
 type pg_receivelog but fat-fingered it.

Yes, you're not the first to dislike it (including me).

pg_receivelogical? Protest now or forever hold your peace.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6

2013-09-23 Thread Peter Geoghegan
On Mon, Sep 23, 2013 at 1:46 AM, Andres Freund and...@2ndquadrant.com wrote:
 pg_receivelogical? Protest now or forever hold your peace.


I was thinking pg_receiveloglog, but that works just as well.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6

2013-09-23 Thread Andres Freund
On 2013-09-23 13:47:05 -0300, Alvaro Herrera wrote:
 Andres Freund escribió:
  On 2013-09-20 14:15:23 -0700, Peter Geoghegan wrote:
   I have a little bit of feedback that I forgot to mention in my earlier
   reviews, because I thought it was too trivial then: something about
   the name pg_receivellog annoys me in a way that the name
   pg_receivexlog does not. Specifically, it looks like someone meant to
   type pg_receivelog but fat-fingered it.
  
  Yes, you're not the first to dislike it (including me).
  
  pg_receivelogical? Protest now or forever hold your peace.
 
 I had proposed pg_recvlogical

I still find it wierd/inconsistent to have:
* pg_receivexlog
* pg_recvlogical
binaries, even from the same source directory. Why once pg_recv and
once pg_receive?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6

2013-09-23 Thread Alvaro Herrera
Andres Freund escribió:
 On 2013-09-20 14:15:23 -0700, Peter Geoghegan wrote:
  I have a little bit of feedback that I forgot to mention in my earlier
  reviews, because I thought it was too trivial then: something about
  the name pg_receivellog annoys me in a way that the name
  pg_receivexlog does not. Specifically, it looks like someone meant to
  type pg_receivelog but fat-fingered it.
 
 Yes, you're not the first to dislike it (including me).
 
 pg_receivelogical? Protest now or forever hold your peace.

I had proposed pg_recvlogical

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6

2013-09-23 Thread Peter Geoghegan
On Mon, Sep 23, 2013 at 9:54 AM, Andres Freund and...@2ndquadrant.com wrote:
 I still find it wierd/inconsistent to have:
 * pg_receivexlog
 * pg_recvlogical
 binaries, even from the same source directory. Why once pg_recv and
 once pg_receive?

+1

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6

2013-09-23 Thread Alvaro Herrera
Andres Freund escribió:
 On 2013-09-23 13:47:05 -0300, Alvaro Herrera wrote:

  I had proposed pg_recvlogical
 
 I still find it wierd/inconsistent to have:
 * pg_receivexlog
 * pg_recvlogical
 binaries, even from the same source directory. Why once pg_recv and
 once pg_receive?

Well.  What are the principles we want to follow when choosing a name?
Is consistency the first and foremost consideration?  To me, that names
are exactly consistent is not all that relevant; I prefer a shorter name
if it embodies all it means.  For that reason I didn't like the
receiveloglog suggestion: it's not clear what are the two log bits.
To me this suggests that logical should not be shortened.  But the
recv thing is clear to be receive, isn't it?  Enough that it can be
shortened without loss of meaning.

If we consider consistency in naming of tools is uber-important, well,
obviously my proposal is dead.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6

2013-09-23 Thread Peter Eisentraut
On 9/23/13 12:54 PM, Andres Freund wrote:
 I still find it wierd/inconsistent to have:
 * pg_receivexlog
 * pg_recvlogical
 binaries, even from the same source directory. Why once pg_recv and
 once pg_receive?

It's consistent because they are the same length!

(Obviously, this would severely restrict future tool naming.)

In all seriousness, I like this naming best so far.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6

2013-09-23 Thread Robert Haas
On Mon, Sep 23, 2013 at 1:11 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Andres Freund escribió:
 On 2013-09-23 13:47:05 -0300, Alvaro Herrera wrote:

  I had proposed pg_recvlogical

 I still find it wierd/inconsistent to have:
 * pg_receivexlog
 * pg_recvlogical
 binaries, even from the same source directory. Why once pg_recv and
 once pg_receive?

 Well.  What are the principles we want to follow when choosing a name?
 Is consistency the first and foremost consideration?  To me, that names
 are exactly consistent is not all that relevant; I prefer a shorter name
 if it embodies all it means.  For that reason I didn't like the
 receiveloglog suggestion: it's not clear what are the two log bits.
 To me this suggests that logical should not be shortened.  But the
 recv thing is clear to be receive, isn't it?  Enough that it can be
 shortened without loss of meaning.

 If we consider consistency in naming of tools is uber-important, well,
 obviously my proposal is dead.

What exactly is the purpose of this tool?  My impression is that the
output of logical replication is a series of function calls to a
logical replication plugin, but does that plugin necessarily have to
produce an output format that gets streamed to a client via a tool
like this?  For example, for replication, I'd think you might want the
plugin to connect to a remote database and directly shove the data in;
for materialized views, we might like to push the changes into delta
relations within the source database.  In either case, there's no
particular need for any sort of client at all, and in fact it would be
much better if none were required.  The existence of a tool like
pg_receivellog seems to presuppose that the goal is spit out logical
change records as text, but I'm not sure that's actually going to be a
very common thing to want to do...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6

2013-09-23 Thread Peter Geoghegan
On Mon, Sep 23, 2013 at 8:12 PM, Robert Haas robertmh...@gmail.com wrote:
 The existence of a tool like
 pg_receivellog seems to presuppose that the goal is spit out logical
 change records as text, but I'm not sure that's actually going to be a
 very common thing to want to do...

Sure, but I think it's still worth having, for debugging purposes and
so on. Perhaps the incorrect presupposition is that it deserves to
live in /bin and not /contrib. Also, even though the tool is
derivative of pg_receivexlog, its reason for existing is sufficiently
different that maybe it deserves an entirely distinct name. On the
other hand, precisely because it's derivative of
receivelog/pg_receivexlog, it kind of makes sense to group them
together like that. So I don't know.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6

2013-09-21 Thread Steve Singer

On 09/20/2013 06:33 AM, Andres Freund wrote:

Hi,




The points I find daunting are the semantics, like:
* How do we control whether a standby is allowed prevent WAL file
   removal. What if archiving is configured?
* How do we control whether a standby is allowed to peg xmin?
* How long do we peg an xmin/wal file removal if the standby is gone
* How does the userinterface look to remove a slot if a standby is gone
* How do we decide/control which commands use a slot in which cases?


I think we are going to want to be flexible enough to support users with 
a couple of different points of use-cases.
* Some people will want to keep xmin pegged and prevent WAL removal so a 
standby with a slot can always catch up, and wi
* Most people will want to say keep X megabytes of WA (if needed by a 
behind slot) and keep xmin pegged so that the WAL can be consumed by a 
logical plugin.


I can see us also implementing a restore_command that the walsender 
could use to get archived segments but for logical replication xmin 
would still need to be low enough


I don't think the current patch set is incompatible with us later 
implementing any of the above. I'd rather see us focus on getting the 
core functionality committed and worry about a good interface for 
managing slots later.



Greetings, Andres Freund 


Steve



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6

2013-09-20 Thread Andres Freund
Hi,

On 2013-09-19 12:05:35 -0400, Robert Haas wrote:
 No question.  I'm not saying that that optimization shouldn't go in
 right after the main patch does, but IMHO right now there are too many
 things going in the 0004 patch to discuss them all simultaneously.
 I'd like to find a way of splitting this up that will let us
 block-and-tackle individual pieces of it, even we end up committing
 them all one right after the other.

Fine with me. I was critized for splitting up stuff too much before ;)

Expect a finer-grained series.

 But that raises an interesting question: why is the overhead so bad?
 I mean, this shouldn't be any worse than having a series of
 transactions running concurrently with pgbench that take a snapshot
 and hold it for as long as it takes the decoding process to decode the
 most-recently committed transaction.

Pgbench really slows down scarily if there are some slightly longer
running transactions around...

 Is the issue here that we can't
 advance xmin until we've fsync'd the fruits of decoding down to disk?

Basically yes. We only advance the xmin of the slot so far that we could
still build a valid snapshot to decode the first transaction not
confirmed to have been synced to disk by the client.

 If so, that's mighty painful.  But we'd really only need to hold back
 xmin in that situation when some catalog change has occurred
 meanwhile, which for pgbench means never.  So something seems fishy
 here.

It's less simple than that. We need to protect against concurrent DDL
producing deleted rows that we will still need. We need
HeapTupleStisfiesVacuum() to return HEAPTUPLE_RECENTLY_DEAD not
HEAPTUPLE_DEAD for such rows, right?
The way to do that is to guarantee that if
TransactionIdDidCommit(xmax) is true, TransactionIdPrecedes(xmax, OldestXmin) 
is also true.
So, we need to peg OldestXmin (as passed to HTSV) to the xid of the
oldest transaction we're still decoding.

I am not sure how you could do that iff somewhere in the future DDL has
started since there's no interlock preventing anyone against doing so.

  - It still bothers me that we're going to have mandatory slots for
  logical replication and no slots for physical replication.

  If people think this needs to be a general facility from the start, I
  can be convinced that way, but I think there's so much to discuss around
  the semantics and different usecases that I'd much prefer to discuss
  that later.
 
 I'm worried that if we don't know how the physical replication slots
 are going to work, they'll end up being randomly different from the
 logical replication slots, and that'll be an API wart which we'll have
 a hard time getting rid of later.

Hm. I actually think that minus some s/Logical//g and a mv won't be much
need to change on the slot interface itself.

What we need for physical rep is basically to a) store the position up
to where the primary has fsynced the WAL b) store the xmin horizon the standby
currently has.
Sure, we can store more stats (most of pg_stat_replication, perhaps some
more) but that's not functionally critical and not hard to extend.

The points I find daunting are the semantics, like:
* How do we control whether a standby is allowed prevent WAL file
  removal. What if archiving is configured?
* How do we control whether a standby is allowed to peg xmin?
* How long do we peg an xmin/wal file removal if the standby is gone
* How does the userinterface look to remove a slot if a standby is gone
* How do we decide/control which commands use a slot in which cases?


  - What is the purpose of (Un)SuspendDecodingSnapshots?  It seems that
  should be explained somewhere.  I have my doubts about how safe that
  is.
 
  I'll document the details if they aren't right now. Consider what
  happens if somebody does something like: VACUUM FULL pg_am;. If we
  were to build the relation descriptor of pg_am in an historical
  snapshot, as you coin it, we'd have the wrong filenode in there. And
  consequently any future lookups in pg_am will open a file that doesn't
  exist.
  That problem only exist for non-nailed relations that are accessed
  during decoding.
 
 But if it's some user table flagged with the terribly-named
 treat_as_catalog_table flag, then they could have not only changed the
 relfilenode but also the tupledesc.  And then you can't just wave your
 hands at the problem.

Heh. Well cought.

There's a comment about that somewhere... Those are problematic, my plan
so far is to throw my hands up and forbid alter tables that rewrite
those.

I know you don't like that flag and especially it's name. I am open to
suggestions to a) rename it b) find a better solution. I am pretty sure
a) is possible but I have severe doubts about any realistic b).

  Yes, I don't like it either. I am not sure what to replace it with
  though. It's easy enough to fit something in GetCatalogSnapshot() and I
  don't have a problem with that, but I am less happy with adding code
  like that to GetSnapshotData() for 

Re: [HACKERS] logical changeset generation v6

2013-09-20 Thread Peter Geoghegan
On Thu, Sep 19, 2013 at 7:43 AM, Andres Freund and...@2ndquadrant.com wrote:
 More generally, the thing that bugs me about this approach is that
 logical replication is not really special, but what you've done here
 MAKES it special. There are plenty of other situations where we are
 too aggressive about holding back xmin.  A single-statement
 transaction that selects from a single table holds back xmin for the
 entire cluster, and that is Very Bad.  It would probably be unfair to
 say, well, you have to solve that problem first.  But on the other
 hand, how do we know that the approach you've adopted here isn't going
 to make the more general problem harder to solve?  It seems invasive
 at a pretty low level.

I agree that it's invasive, but I am doubtful that pegging the xmin in
a more granular fashion precludes this kind of optimization. We might
have to generalize what Andres has done, which could mean eventually
throwing it out and starting from scratch, but I have a hard time
seeing how that implies an appreciable cost above solving the general
problem first (now that Andres has already implemented the
RecentGlobalDataXmin thing). As I'm sure you appreciate, the cost of
doing the opposite - of solving the general problem first - may be
huge: waiting another release for logical changeset generation.

 The reason why I think it's actually different is that the user actually
 has control over how long transactions are running on the primary. They
 don't really control how fast a replication consumer consumes and how
 often it sends feedback messages.

Right. That's about what I said last year.

I find the following analogy useful: A logical changeset generation
implementation without RecentGlobalDataXmin is kind of like an
old-fashioned nuclear reactor, like the one they had at Chernobyl.
Engineers have to actively work in order to prevent it from
overheating. However, an implementation with RecentGlobalDataXmin is
like a modern, much safer nuclear reactor. Engineers have to actively
work to keep the reactor heated. Which is to say, with
RecentGlobalDataXmin a standby that dies cannot bloat the master too
much (almost as with hot_standby_feedback - that too requires active
participation from the standby to do harm to the master). Without
RecentGlobalDataXmin, the core system and the plugin at the very least
need to worry about that case when a standby dies.

I have a little bit of feedback that I forgot to mention in my earlier
reviews, because I thought it was too trivial then: something about
the name pg_receivellog annoys me in a way that the name
pg_receivexlog does not. Specifically, it looks like someone meant to
type pg_receivelog but fat-fingered it.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6

2013-09-19 Thread Robert Haas
On Tue, Sep 17, 2013 at 10:31 AM, Andres Freund and...@2ndquadrant.com wrote:
 Rebased patches attached.

I spent a bit of time looking at these patches yesterday and today.
It seems to me that there's a fair amount of stylistic cleanup that is
still needed, and some pretty bad naming choices, and some FIXMEs that
should probably be fixed, but for an initial review email it seems
more helpful to focus on high-level design, so here goes.

- Looking specifically at the 0004 patch, I think that the
RecentGlobalDataXmin changes are logically separate from the rest of
the patch, and that if we're going to commit them at all, it should be
separate from the rest of this.  I think this is basically a
performance optimization.  AFAICS, there's no correctness problem with
the idea of continuing to maintain a single RecentGlobalXmin; it's
just that you'll potentially end up with quite a lot of bloat.  But
that's not an argument that those two things HAVE to be committed
together; either could be done first, and independently of the other.
Also, these changes are quite complex and it's different to form a
judgement as to whether this idea is safe when they are intermingled
with the rest of the logical replication stuff.

More generally, the thing that bugs me about this approach is that
logical replication is not really special, but what you've done here
MAKES it special. There are plenty of other situations where we are
too aggressive about holding back xmin.  A single-statement
transaction that selects from a single table holds back xmin for the
entire cluster, and that is Very Bad.  It would probably be unfair to
say, well, you have to solve that problem first.  But on the other
hand, how do we know that the approach you've adopted here isn't going
to make the more general problem harder to solve?  It seems invasive
at a pretty low level.  I think we should at least spend some time
thinking about what *general* solutions to this problem would like
like and then decide whether this is approach is likely to be
forward-compatible with those solutions.

- There are no changes to the doc directory.  Obviously, if you're
going to add a new value for the wal_level GUC, it's gonna need to be
documented. Similarly, pg_receivellog needs to be documented.  In all
likelihood, you also need a whole chapter providing general background
on this technology.  A couple of README files is not going to do it,
and those files aren't suitable for check-in anyway (e.g. DESIGN.txt
makes reference to a URL where the current version of some patch can
be found; that's not appropriate for permanent documentation).  But
aside from that, what we really need here is user documentation, not
developer documentation.  I can perhaps pass judgement on whether the
guts of this functionality do things that are fundamentally unsafe,
but whether the user interface is good or bad is a question that
deserves broader input, and without documentation, most people aren't
going to understand it well enough to know whether they like it.  And
TBH, *I* don't really want to reverse-engineer what pg_receivellog
does from a read-through of the code, either.

- Given that we don't reassemble transactions until commit time, why
do we need to to ensure that XIDs are logged before their sub-XIDs
appear in WAL?  As I've said previously, I actually think that
on-the-fly reassembly is probably going to turn out to be very
important.  But if we're not getting that, do we really need this?
Also, while I'm trying to keep this email focused on high-level
concerns, I have to say that guaranteedlyLogged has got to be one of
the worst variable names I've ever seen, starting (but not ending)
with the fact that guaranteedly is not a word.  I'm also tempted to
say that all of the wal_level=logical changes should be split out as
their own patch, separate from the decoding stuff.  Personally, I
would find that much easier to review, although I admit it's less
critical here than for the RecentGlobalDataXmin stuff.

- If XLOG_HEAP_INPLACE is not decoded, doesn't that mean that this
facility can't be used to replicate a system catalog table?  Is that a
restriction we should enforce/document somehow?

- The new code is rather light on comments.  decode.c is extremely
light. For example, consider the function DecodeAbort(), which
contains the following comment:

+   /*
+* this is a bit grotty, but if we're faking an abort we've
already gone
+* through
+*/

Well, I have no idea what that means.  I'm sure you do, but I bet the
next person who isn't you that tries to understand this probably
won't.  It's also probably true that I could figure it out if I spent
more time on it, but I think the point of comments is to keep the
amount of time that must be spent trying to understand code to a
manageable level.  Generally, I'd suggest that any non-trivial
functions in these files should have a header comment explaining what
their job is; e.g. for DecodeStandbyOp you 

Re: [HACKERS] logical changeset generation v6

2013-09-19 Thread Andres Freund
Hi Robert,

On 2013-09-19 10:02:31 -0400, Robert Haas wrote:
 On Tue, Sep 17, 2013 at 10:31 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  Rebased patches attached.
 
 I spent a bit of time looking at these patches yesterday and today.
 It seems to me that there's a fair amount of stylistic cleanup that is
 still needed, and some pretty bad naming choices, and some FIXMEs that
 should probably be fixed, but for an initial review email it seems
 more helpful to focus on high-level design, so here goes.

Thanks for looking at it.

Yes, I think the highlevel stuff is the important bit.

As you note, the documentation needs to be written and that's certainly
not a small task. But doing so before the highlevel design is agreed
upon makes it too likely that it will need to be entirely scrapped.

 - Looking specifically at the 0004 patch, I think that the
 RecentGlobalDataXmin changes are logically separate from the rest of
 the patch, and that if we're going to commit them at all, it should be
 separate from the rest of this.  I think this is basically a
 performance optimization.  AFAICS, there's no correctness problem with
 the idea of continuing to maintain a single RecentGlobalXmin; it's
 just that you'll potentially end up with quite a lot of bloat.  But
 that's not an argument that those two things HAVE to be committed
 together; either could be done first, and independently of the other.
 Also, these changes are quite complex and it's different to form a
 judgement as to whether this idea is safe when they are intermingled
 with the rest of the logical replication stuff.

Up until v3 the RecentGlobalDataXmin stuff wasn't included and reviewers
(primarily Peter G. on -hackers and Greg Stark at pgconf.eu) remarked on
that and considered it critical. I argued for a considerable amount of
time that it shouldn't be done in an initial patch and then gave in.

They have a point though, if you e.g. replicate a pgbench -j16 workload
the addition of RecentGlobalDataXmin reduces the performance impact of
replication from about 60% to less than 5% in my measurements. Turns out
heap pruning is damn critical for that kind of workload.

 More generally, the thing that bugs me about this approach is that
 logical replication is not really special, but what you've done here
 MAKES it special. There are plenty of other situations where we are
 too aggressive about holding back xmin.  A single-statement
 transaction that selects from a single table holds back xmin for the
 entire cluster, and that is Very Bad.  It would probably be unfair to
 say, well, you have to solve that problem first.  But on the other
 hand, how do we know that the approach you've adopted here isn't going
 to make the more general problem harder to solve?  It seems invasive
 at a pretty low level.

The reason why I think it's actually different is that the user actually
has control over how long transactions are running on the primary. They
don't really control how fast a replication consumer consumes and how
often it sends feedback messages.

 I think we should at least spend some time
 thinking about what *general* solutions to this problem would like
 like and then decide whether this is approach is likely to be
 forward-compatible with those solutions.

I thought about the general case for a good bit and decided that all
solutions that work in a more general scenario are complex enough that I
don't want to implement them. And I don't really see any backward
compatibility concerns here - removing the logic of using a separate
horizon for user tables in contrast to system tables is pretty trivial
and shouldn't have any external effect. Except pegging the horizon more,
but that's what the new approach would fix, right?

 - Given that we don't reassemble transactions until commit time, why
 do we need to to ensure that XIDs are logged before their sub-XIDs
 appear in WAL?

Currently it's important to know where the oldest transaction that is
alive started at to determine from where we need to restart
decoding. That's done by keeping a lsn-ordered list of in progress
toplevel transactions. The above invariant makes it cheap to maintain
that list.

 As I've said previously, I actually think that
 on-the-fly reassembly is probably going to turn out to be very
 important. But if we're not getting that, do we really need this?

It's also preparatory for supporting that.

I agree that it's pretty important, but after actually having
implemented a replication solution using this, I still think that most
usecase won't using it when available. I plan to work on implementing
that.

 Also, while I'm trying to keep this email focused on high-level
 concerns, I have to say that guaranteedlyLogged has got to be one of
 the worst variable names I've ever seen, starting (but not ending)
 with the fact that guaranteedly is not a word.  I'm also tempted to
 say that all of the wal_level=logical changes should be split out as
 their own patch, separate from the 

Re: [HACKERS] logical changeset generation v6

2013-09-19 Thread Robert Haas
On Thu, Sep 19, 2013 at 10:43 AM, Andres Freund and...@2ndquadrant.com wrote:
 - Looking specifically at the 0004 patch, I think that the
 RecentGlobalDataXmin changes are logically separate from the rest of
 the patch, and that if we're going to commit them at all, it should be
 separate from the rest of this.  I think this is basically a
 performance optimization.  AFAICS, there's no correctness problem with
 the idea of continuing to maintain a single RecentGlobalXmin; it's
 just that you'll potentially end up with quite a lot of bloat.  But
 that's not an argument that those two things HAVE to be committed
 together; either could be done first, and independently of the other.
 Also, these changes are quite complex and it's different to form a
 judgement as to whether this idea is safe when they are intermingled
 with the rest of the logical replication stuff.

 Up until v3 the RecentGlobalDataXmin stuff wasn't included and reviewers
 (primarily Peter G. on -hackers and Greg Stark at pgconf.eu) remarked on
 that and considered it critical. I argued for a considerable amount of
 time that it shouldn't be done in an initial patch and then gave in.

 They have a point though, if you e.g. replicate a pgbench -j16 workload
 the addition of RecentGlobalDataXmin reduces the performance impact of
 replication from about 60% to less than 5% in my measurements. Turns out
 heap pruning is damn critical for that kind of workload.

No question.  I'm not saying that that optimization shouldn't go in
right after the main patch does, but IMHO right now there are too many
things going in the 0004 patch to discuss them all simultaneously.
I'd like to find a way of splitting this up that will let us
block-and-tackle individual pieces of it, even we end up committing
them all one right after the other.

But that raises an interesting question: why is the overhead so bad?
I mean, this shouldn't be any worse than having a series of
transactions running concurrently with pgbench that take a snapshot
and hold it for as long as it takes the decoding process to decode the
most-recently committed transaction.  Is the issue here that we can't
advance xmin until we've fsync'd the fruits of decoding down to disk?
If so, that's mighty painful.  But we'd really only need to hold back
xmin in that situation when some catalog change has occurred
meanwhile, which for pgbench means never.  So something seems fishy
here.

 I thought about the general case for a good bit and decided that all
 solutions that work in a more general scenario are complex enough that I
 don't want to implement them. And I don't really see any backward
 compatibility concerns here - removing the logic of using a separate
 horizon for user tables in contrast to system tables is pretty trivial
 and shouldn't have any external effect. Except pegging the horizon more,
 but that's what the new approach would fix, right?

Hmm, maybe.

 Also, while I'm trying to keep this email focused on high-level
 concerns, I have to say that guaranteedlyLogged has got to be one of
 the worst variable names I've ever seen, starting (but not ending)
 with the fact that guaranteedly is not a word.  I'm also tempted to
 say that all of the wal_level=logical changes should be split out as
 their own patch, separate from the decoding stuff.  Personally, I
 would find that much easier to review, although I admit it's less
 critical here than for the RecentGlobalDataXmin stuff.

 I can do that again and it actually was that way in the past. But
 there's no user for it before the later patch and it's hard to
 understand the reasoning for the changed wal logging separately, that's
 why I merged it at some point.

OK.  If I'm committing it, I'd prefer to handle that piece separately,
if possible.

 - If XLOG_HEAP_INPLACE is not decoded, doesn't that mean that this
 facility can't be used to replicate a system catalog table?  Is that a
 restriction we should enforce/document somehow?

 Currently catalog tables aren't replicated, yes. They simply are skipped
 during decoding. XLOG_HEAP_INPLACE isn't the primary reason for that
 though.

 Do you see a usecase for it?

I can imagine someone wanting to do it, but I think we can live with
it not being supported.

 - It still bothers me that we're going to have mandatory slots for
 logical replication and no slots for physical replication.  Why are
 slots mandatory in one case and not even allowable in the other?  Is
 it sensible to think about slotless logical replication - or maybe I
 should say, is it any LESS sensible than slotless physical
 replication?

 Well, as you know, I do want to have slots for physical replication as
 well. But there actually is a fundamental difference why we need it for
 logical rep and not for physical: In physical replication, if the xmin
 progresses too far, client queries will be cancelled. Annoying but not
 fatal. In logical replication we will not be able to continue
 replicating since we cannot decode the WAL 

Re: [HACKERS] logical changeset generation v6

2013-09-18 Thread Fujii Masao
On Tue, Sep 17, 2013 at 11:31 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-09-17 09:45:28 -0400, Peter Eisentraut wrote:
 On 9/15/13 11:30 AM, Andres Freund wrote:
  On 2013-09-15 11:20:20 -0400, Peter Eisentraut wrote:
  On Sat, 2013-09-14 at 22:49 +0200, Andres Freund wrote:
  Attached you can find the newest version of the logical changeset
  generation patchset.
 
  You probably have bigger things to worry about, but please check the
  results of cpluspluscheck, because some of the header files don't
  include header files they depend on.
 
  Hm. I tried to get that right, but it's been a while since I last
  checked. I don't regularly use cpluspluscheck because it doesn't work in
  VPATH builds... We really need to fix that.
 
  I'll push a fix for that to the git tree, don't think that's worth a
  resend in itself.

 This patch set now fails to apply because of the commit Rename various
 freeze multixact variables.

 And I am even partially guilty for that patch...

 Rebased patches attached.

When I applied all the patches and do the compile, I got the following error:

gcc -O0 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -g -I. -I../../../../src/include -D_GNU_SOURCE   -c -o
snapbuild.o snapbuild.c
snapbuild.c:187: error: redefinition of typedef 'SnapBuild'
../../../../src/include/replication/snapbuild.h:45: note: previous
declaration of 'SnapBuild' was here
make[4]: *** [snapbuild.o] Error 1


When I applied only
0001-wal_decoding-Allow-walsender-s-to-connect-to-a-speci.patch,
compiled the source, and set up the asynchronous replication, I got
the segmentation
fault.

LOG:  server process (PID 12777) was terminated by signal 11:
Segmentation fault

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6

2013-09-17 Thread Peter Eisentraut
On 9/15/13 11:30 AM, Andres Freund wrote:
 On 2013-09-15 11:20:20 -0400, Peter Eisentraut wrote:
 On Sat, 2013-09-14 at 22:49 +0200, Andres Freund wrote:
 Attached you can find the newest version of the logical changeset
 generation patchset.

 You probably have bigger things to worry about, but please check the
 results of cpluspluscheck, because some of the header files don't
 include header files they depend on.
 
 Hm. I tried to get that right, but it's been a while since I last
 checked. I don't regularly use cpluspluscheck because it doesn't work in
 VPATH builds... We really need to fix that.
 
 I'll push a fix for that to the git tree, don't think that's worth a
 resend in itself.

This patch set now fails to apply because of the commit Rename various
freeze multixact variables.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6

2013-09-15 Thread Peter Eisentraut
What's with 0001-Improve-regression-test-for-8410.patch?  Did you mean
to include that?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6

2013-09-15 Thread Andres Freund
On 2013-09-15 10:03:54 -0400, Peter Eisentraut wrote:
 What's with 0001-Improve-regression-test-for-8410.patch?  Did you mean
 to include that?

Gah, no. That's already committed and unrelated. Stupid wildcard.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6

2013-09-15 Thread Peter Eisentraut
On Sat, 2013-09-14 at 22:49 +0200, Andres Freund wrote:
 Attached you can find the newest version of the logical changeset
 generation patchset.

You probably have bigger things to worry about, but please check the
results of cpluspluscheck, because some of the header files don't
include header files they depend on.

(I guess that's really pgcompinclude's job to find out, but
cpluspluscheck seems to be easier to use.)




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v6

2013-09-15 Thread Andres Freund
On 2013-09-15 11:20:20 -0400, Peter Eisentraut wrote:
 On Sat, 2013-09-14 at 22:49 +0200, Andres Freund wrote:
  Attached you can find the newest version of the logical changeset
  generation patchset.
 
 You probably have bigger things to worry about, but please check the
 results of cpluspluscheck, because some of the header files don't
 include header files they depend on.

Hm. I tried to get that right, but it's been a while since I last
checked. I don't regularly use cpluspluscheck because it doesn't work in
VPATH builds... We really need to fix that.

I'll push a fix for that to the git tree, don't think that's worth a
resend in itself.

Thanks,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers