Re: [HACKERS] logical changeset generation v6
On Mon, Sep 23, 2013 at 7:03 PM, Peter Geoghegan p...@heroku.com wrote: On Mon, Sep 23, 2013 at 9:54 AM, Andres Freund and...@2ndquadrant.com wrote: I still find it wierd/inconsistent to have: * pg_receivexlog * pg_recvlogical binaries, even from the same source directory. Why once pg_recv and once pg_receive? +1 Digging up a really old thread since I just got annoyed by the inconsistent naming the first time myself :) I can't find that this discussion actually came to a proper consensus, but I may be missing something. Did we go with pg_recvlogical just because we couldn't decide on a better name, or did we intentionally decide it was the best? I definitely think pg_receivelogical would be a better name, for consistency (because it's way too late to rename pg_receivexlog of course - once released that can't really chance. Which is why *if* we want to change the name of pg_recvxlog we have a few more days to make a decision..) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] logical changeset generation v6
On 2014-04-24 09:39:21 +0200, Magnus Hagander wrote: I can't find that this discussion actually came to a proper consensus, but I may be missing something. Did we go with pg_recvlogical just because we couldn't decide on a better name, or did we intentionally decide it was the best? I went with pg_recvlogical because that's where the (small) majority seemed to be. Even if I was unconvinced. There were so many outstanding big fights at that point that I didn't want to spend my time on this ;) Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6
On Thu, Apr 24, 2014 at 9:43 AM, Andres Freund and...@2ndquadrant.comwrote: On 2014-04-24 09:39:21 +0200, Magnus Hagander wrote: I can't find that this discussion actually came to a proper consensus, but I may be missing something. Did we go with pg_recvlogical just because we couldn't decide on a better name, or did we intentionally decide it was the best? I went with pg_recvlogical because that's where the (small) majority seemed to be. Even if I was unconvinced. There were so many outstanding big fights at that point that I didn't want to spend my time on this ;) I was guessing something like the second part there, which is why I figured this would be a good time to bring this fight back up to the surface ;) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] logical changeset generation v6
On 2014-04-24 09:46:07 +0200, Magnus Hagander wrote: On Thu, Apr 24, 2014 at 9:43 AM, Andres Freund and...@2ndquadrant.comwrote: On 2014-04-24 09:39:21 +0200, Magnus Hagander wrote: I can't find that this discussion actually came to a proper consensus, but I may be missing something. Did we go with pg_recvlogical just because we couldn't decide on a better name, or did we intentionally decide it was the best? I went with pg_recvlogical because that's where the (small) majority seemed to be. Even if I was unconvinced. There were so many outstanding big fights at that point that I didn't want to spend my time on this ;) I was guessing something like the second part there, which is why I figured this would be a good time to bring this fight back up to the surface ;) I have to admit that I still don't care too much. By now I'd tentatively want to stay with the current name because that's what I got used to, but if somebody else has strong opinions and finds concensus... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6
On 09/27/2013 05:18 PM, Andres Freund wrote: Hi Steve, On 2013-09-27 17:06:59 -0400, Steve Singer wrote: I've determined that when in this test the walsender seems to be hitting this when it is decode the transactions that are behind the slonik commands to add tables to replication (set add table, set add sequence). This is before the SUBSCRIBE SET is submitted. I've also noticed something else that is strange (but might be unrelated). If I stop my slon process and restart it I get messages like: WARNING: Starting logical replication from 0/a9321360 ERROR: cannot stream from 0/A9321360, minimum is 0/A9320B00 Where 0/A9321360 was sent in the last packet my slon received from the walsender before the restart. Uh, that looks like I fumbled some comparison. Let me check. I've further narrowed this down to something (or the combination of) what the _disorder_replica.altertableaddTriggers(1); stored function does. (or @SLONYNAMESPACE@.altertableaddTriggers(int); Which is essentially * Get an exclusive lock on sl_config_lock * Get an exclusive lock on the user table in question * create a trigger (the deny access trigger) * create a truncate trigger * create a deny truncate trigger I am not yet able to replicate the error by issuing the same SQL commands from psql, but I must be missing something. I can replicate this when just using the test_decoding plugin. Thanks. That should get me started with debugging. Unless it's possibly fixed in the latest version, one bug fixed there might cause something like this if the moon stands exactly right? The latest version has NOT fixed the problem. Also, I was a bit inaccurate in my previous descriptions. To clarify: 1. I sometimes am getting that 'unexpected duplicate' error 2. The 'set add table ' which triggers those functions that create and configure triggers is actually causing the walsender to hit the following assertion 2 0x00773d47 in ExceptionalCondition ( conditionName=conditionName@entry=0x8cf400 !(ent-cmin == change-tuplecid.cmin), errorType=errorType@entry=0x7ab830 FailedAssertion, fileName=fileName@entry=0x8cecc3 reorderbuffer.c, lineNumber=lineNumber@entry=1162) at assert.c:54 #3 0x00665480 in ReorderBufferBuildTupleCidHash (txn=0x1b6e610, rb=optimized out) at reorderbuffer.c:1162 #4 ReorderBufferCommit (rb=0x1b6e4f8, xid=optimized out, commit_lsn=3461001952, end_lsn=optimized out) at reorderbuffer.c:1285 #5 0x0065f0f7 in DecodeCommit (xid=optimized out, nsubxacts=optimized out, sub_xids=optimized out, ninval_msgs=16, msgs=0x1b637c0, buf=0x7fff54d01530, buf=0x7fff54d01530, ctx=0x1adb928, ctx=0x1adb928) at decode.c:477 I had added an assert(false) to the code where the 'unknown duplicate' error was logged to make spotting this easier but yesterday I didn't double check that I was hitting the assertion I added versus this other one. I can't yet say if this is two unrelated issues or if I'd get to the 'unknown duplicate' message immediately after. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6
On 09/26/2013 02:47 PM, Steve Singer wrote: I've determined that when in this test the walsender seems to be hitting this when it is decode the transactions that are behind the slonik commands to add tables to replication (set add table, set add sequence). This is before the SUBSCRIBE SET is submitted. I've also noticed something else that is strange (but might be unrelated). If I stop my slon process and restart it I get messages like: WARNING: Starting logical replication from 0/a9321360 ERROR: cannot stream from 0/A9321360, minimum is 0/A9320B00 Where 0/A9321360 was sent in the last packet my slon received from the walsender before the restart. If force it to restart replication from 0/A9320B00 I see datarows that I appear to have already seen before the restart. I think this is happening when I process the data for 0/A9320B00 but don't get the feedback message my slon was killed. Is this expected? I've further narrowed this down to something (or the combination of) what the _disorder_replica.altertableaddTriggers(1); stored function does. (or @SLONYNAMESPACE@.altertableaddTriggers(int); Which is essentially * Get an exclusive lock on sl_config_lock * Get an exclusive lock on the user table in question * create a trigger (the deny access trigger) * create a truncate trigger * create a deny truncate trigger I am not yet able to replicate the error by issuing the same SQL commands from psql, but I must be missing something. I can replicate this when just using the test_decoding plugin. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6
Hi Steve, On 2013-09-27 17:06:59 -0400, Steve Singer wrote: I've determined that when in this test the walsender seems to be hitting this when it is decode the transactions that are behind the slonik commands to add tables to replication (set add table, set add sequence). This is before the SUBSCRIBE SET is submitted. I've also noticed something else that is strange (but might be unrelated). If I stop my slon process and restart it I get messages like: WARNING: Starting logical replication from 0/a9321360 ERROR: cannot stream from 0/A9321360, minimum is 0/A9320B00 Where 0/A9321360 was sent in the last packet my slon received from the walsender before the restart. Uh, that looks like I fumbled some comparison. Let me check. I've further narrowed this down to something (or the combination of) what the _disorder_replica.altertableaddTriggers(1); stored function does. (or @SLONYNAMESPACE@.altertableaddTriggers(int); Which is essentially * Get an exclusive lock on sl_config_lock * Get an exclusive lock on the user table in question * create a trigger (the deny access trigger) * create a truncate trigger * create a deny truncate trigger I am not yet able to replicate the error by issuing the same SQL commands from psql, but I must be missing something. I can replicate this when just using the test_decoding plugin. Thanks. That should get me started with debugging. Unless it's possibly fixed in the latest version, one bug fixed there might cause something like this if the moon stands exactly right? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6
On 09/25/2013 01:20 PM, Steve Singer wrote: On 09/25/2013 11:08 AM, Andres Freund wrote: On 2013-09-25 11:01:44 -0400, Steve Singer wrote: On 09/17/2013 10:31 AM, Andres Freund wrote: This patch set now fails to apply because of the commit Rename various freeze multixact variables. And I am even partially guilty for that patch... Rebased patches attached. While testing the logical replication changes against my WIP logical slony I am sometimes getting error messages from the WAL sender of the form: unexpected duplicate for tablespace X relfilenode X Any chance you could provide a setup to reproduce the error? The steps to build a setup that should reproduce this error are: 1. I had apply the attached patch on top of your logical replication branch so my pg_decode_init would now if it was being called as part of a INIT_REPLICATION or START_REPLICATION. Unless I have misunderstood something you probably will want to merge this fix in 2. Get my WIP for adding logical support to slony from: g...@github.com:ssinger/slony1-engine.git branch logical_repl (4af1917f8418a) (My code changes to slony are more prototype level code quality than production code quality) 3. cd slony1-engine ./configure --with-pgconfigdir=/usr/local/pg94wal/bin (or whatever) make make install 4. Grab the clustertest framework JAR from https://github.com/clustertest/clustertest-framework and build up a clustertest jar file 5. Create a file slony1-engine/clustertest/conf/java.conf that contains the path to the above JAR file as a shell variable assignment: ie CLUSTERTESTJAR=/home/ssinger/src/clustertest/clustertest_git/build/jar/clustertest-coordinator.jar 6. cp clustertest/conf/disorder.properties.sample clustertest/conf/disorder.properties edit disorder.properites to have the proper values for your environment. All 6 databases can point at the same postgres instance, this test will only actually use 2 of them(so far). 7. Run the test cd clustertest ./run_all_disorder_tests.sh This involves having the slon connect to the walsender on the database test1 and replicate the data into test2 (which is a different database on the same postmaster) If this setup seems like too much effort I can request one of the commitfest VM's from Josh and get everything setup there for you. Steve Any ideas? I'll look into it. Could you provide any context to what youre doing that's being decoded? I've determined that when in this test the walsender seems to be hitting this when it is decode the transactions that are behind the slonik commands to add tables to replication (set add table, set add sequence). This is before the SUBSCRIBE SET is submitted. I've also noticed something else that is strange (but might be unrelated). If I stop my slon process and restart it I get messages like: WARNING: Starting logical replication from 0/a9321360 ERROR: cannot stream from 0/A9321360, minimum is 0/A9320B00 Where 0/A9321360 was sent in the last packet my slon received from the walsender before the restart. If force it to restart replication from 0/A9320B00 I see datarows that I appear to have already seen before the restart. I think this is happening when I process the data for 0/A9320B00 but don't get the feedback message my slon was killed. Is this expected? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6
On 09/17/2013 10:31 AM, Andres Freund wrote: This patch set now fails to apply because of the commit Rename various freeze multixact variables. And I am even partially guilty for that patch... Rebased patches attached. While testing the logical replication changes against my WIP logical slony I am sometimes getting error messages from the WAL sender of the form: unexpected duplicate for tablespace X relfilenode X The call stack is HeapTupleSatisfiesMVCCDuringDecoding heap_hot_search_buffer index_fetch_heap index_getnext systable_getnext RelidByRelfilenode ReorderBufferCommit DecodeCommit . . . I am working off something based on your version e0acfeace6d695c229efd5d78041a1b734583431 Any ideas? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6
On 2013-09-25 11:01:44 -0400, Steve Singer wrote: On 09/17/2013 10:31 AM, Andres Freund wrote: This patch set now fails to apply because of the commit Rename various freeze multixact variables. And I am even partially guilty for that patch... Rebased patches attached. While testing the logical replication changes against my WIP logical slony I am sometimes getting error messages from the WAL sender of the form: unexpected duplicate for tablespace X relfilenode X Any chance you could provide a setup to reproduce the error? Any ideas? I'll look into it. Could you provide any context to what youre doing that's being decoded? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6
On 09/25/2013 11:08 AM, Andres Freund wrote: On 2013-09-25 11:01:44 -0400, Steve Singer wrote: On 09/17/2013 10:31 AM, Andres Freund wrote: This patch set now fails to apply because of the commit Rename various freeze multixact variables. And I am even partially guilty for that patch... Rebased patches attached. While testing the logical replication changes against my WIP logical slony I am sometimes getting error messages from the WAL sender of the form: unexpected duplicate for tablespace X relfilenode X Any chance you could provide a setup to reproduce the error? The steps to build a setup that should reproduce this error are: 1. I had apply the attached patch on top of your logical replication branch so my pg_decode_init would now if it was being called as part of a INIT_REPLICATION or START_REPLICATION. Unless I have misunderstood something you probably will want to merge this fix in 2. Get my WIP for adding logical support to slony from: g...@github.com:ssinger/slony1-engine.git branch logical_repl (4af1917f8418a) (My code changes to slony are more prototype level code quality than production code quality) 3. cd slony1-engine ./configure --with-pgconfigdir=/usr/local/pg94wal/bin (or whatever) make make install 4. Grab the clustertest framework JAR from https://github.com/clustertest/clustertest-framework and build up a clustertest jar file 5. Create a file slony1-engine/clustertest/conf/java.conf that contains the path to the above JAR file as a shell variable assignment: ie CLUSTERTESTJAR=/home/ssinger/src/clustertest/clustertest_git/build/jar/clustertest-coordinator.jar 6. cp clustertest/conf/disorder.properties.sample clustertest/conf/disorder.properties edit disorder.properites to have the proper values for your environment. All 6 databases can point at the same postgres instance, this test will only actually use 2 of them(so far). 7. Run the test cd clustertest ./run_all_disorder_tests.sh This involves having the slon connect to the walsender on the database test1 and replicate the data into test2 (which is a different database on the same postmaster) If this setup seems like too much effort I can request one of the commitfest VM's from Josh and get everything setup there for you. Steve Any ideas? I'll look into it. Could you provide any context to what youre doing that's being decoded? Greetings, Andres Freund From 9efae980c357d3b75c6d98204ed75b8d29ed6385 Mon Sep 17 00:00:00 2001 From: Steve Singer ssin...@ca.afilias.info Date: Mon, 16 Sep 2013 10:21:39 -0400 Subject: [PATCH] set the init parameter to true when performing an INIT REPLICATION --- src/backend/replication/walsender.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 2187d96..1b8f289 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -743,7 +743,7 @@ InitLogicalReplication(InitLogicalReplicationCmd *cmd) sendTimeLine = ThisTimeLineID; initStringInfo(output_message); - ctx = CreateLogicalDecodingContext(MyLogicalDecodingSlot, false, InvalidXLogRecPtr, + ctx = CreateLogicalDecodingContext(MyLogicalDecodingSlot, true, InvalidXLogRecPtr, NIL, replay_read_page, WalSndPrepareWrite, WalSndWriteData); -- 1.7.10.4 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6
On 2013-09-23 23:12:53 -0400, Robert Haas wrote: What exactly is the purpose of this tool? My impression is that the output of logical replication is a series of function calls to a logical replication plugin, but does that plugin necessarily have to produce an output format that gets streamed to a client via a tool like this? There needs to be a client acking the reception of the data in some form. There's currently two output methods, SQL and walstreamer, but there easily could be further, it's basically two functions you have write. There are several reasons I think the tool is useful, starting with the fact that it makes the initial use of the feature easier. Writing a client for CopyBoth messages wrapping 'w' style binary messages, with the correct select() loop isn't exactly trivial. I also think it's actually useful in real scenarios where you want to ship the data to a remote system for auditing purposes. For example, for replication, I'd think you might want the plugin to connect to a remote database and directly shove the data in; That sounds like a bad idea to me. If you pull the data from the remote side, you get the data in a streaming fashion and the latency sensitive part of issuing statements to your local database is done locally. Doing things synchronously like that also makes it way harder to use synchronous_commit = off on the remote side, which is a tremendous efficiency win. If somebody needs something like this, e.g. because they want to replicate into hundreds of shards depending on some key or such, the question I don't know is how to actually initiate the streaming. Somebody would need to start the logical decoding. for materialized views, we might like to push the changes into delta relations within the source database. Yes, that's not a bad usecase and I think the only thing missing to use output plugins that way is a convenient function to tell up to where data has been received (aka synced to disk, aka applied). In either case, there's no particular need for any sort of client at all, and in fact it would be much better if none were required. The existence of a tool like pg_receivellog seems to presuppose that the goal is spit out logical change records as text, but I'm not sure that's actually going to be a very common thing to want to do... It doesn't really rely on anything being text - I've used it with a binary plugin without problems. Obviously you might not want to use -f - but an actual file instead... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6
On Tue, Sep 24, 2013 at 4:15 AM, Andres Freund and...@2ndquadrant.com wrote: There needs to be a client acking the reception of the data in some form. There's currently two output methods, SQL and walstreamer, but there easily could be further, it's basically two functions you have write. There are several reasons I think the tool is useful, starting with the fact that it makes the initial use of the feature easier. Writing a client for CopyBoth messages wrapping 'w' style binary messages, with the correct select() loop isn't exactly trivial. I also think it's actually useful in real scenarios where you want to ship the data to a remote system for auditing purposes. I have two basic points here: - Requiring a client is a short-sighted design. There's no reason we shouldn't *support* having a client, but IMHO it shouldn't be the only way to use the feature. - Suppose that you use pg_receivellog (or whatever we decide to call it) to suck down logical replication messages. What exactly are you going to do with that data once you've got it? In the case of pg_receivexlog it's quite obvious what you will do with the received files: you'll store them in archive of some kind and maybe eventually use them for archive recovery, streaming replication, or PITR. But the answer here is a lot less obvious, at least to me. For example, for replication, I'd think you might want the plugin to connect to a remote database and directly shove the data in; That sounds like a bad idea to me. If you pull the data from the remote side, you get the data in a streaming fashion and the latency sensitive part of issuing statements to your local database is done locally. Doing things synchronously like that also makes it way harder to use synchronous_commit = off on the remote side, which is a tremendous efficiency win. This sounds like the voice of experience talking, so I won't argue too much, but I don't think it's central to my point. And anyhow, even if it is a bad idea, that doesn't mean someone won't want to do it. :-) If somebody needs something like this, e.g. because they want to replicate into hundreds of shards depending on some key or such, the question I don't know is how to actually initiate the streaming. Somebody would need to start the logical decoding. Sounds like a job for a background worker. It would be pretty swell if you could write a background worker that connects to a logical replication slot and then does whatever. for materialized views, we might like to push the changes into delta relations within the source database. Yes, that's not a bad usecase and I think the only thing missing to use output plugins that way is a convenient function to tell up to where data has been received (aka synced to disk, aka applied). Yes. It feels to me (and I only work here) like the job of the output plugin ought to be to put the data somewhere, and the replication code shouldn't make too many assumptions about where it's actually going. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6
On 2013-09-24 11:04:06 -0400, Robert Haas wrote: - Requiring a client is a short-sighted design. There's no reason we shouldn't *support* having a client, but IMHO it shouldn't be the only way to use the feature. There really aren't many limitations preventing you from doing anything else. - Suppose that you use pg_receivellog (or whatever we decide to call it) to suck down logical replication messages. What exactly are you going to do with that data once you've got it? In the case of pg_receivexlog it's quite obvious what you will do with the received files: you'll store them in archive of some kind and maybe eventually use them for archive recovery, streaming replication, or PITR. But the answer here is a lot less obvious, at least to me. Well, it's not like it's going to be the only client. But it's a useful one. I don't see this as an argument against pg_receivelogical? Most sites don't use pg_receivexlog either. Not having a consumer of the walsender interface included sounds like a bad idea to me, even if it were only useful for testing. Now, you could argue it should be in /contrib - and I wouldn't argue against that except it shares code with the rest of src/bin/pg_basebackup. If somebody needs something like this, e.g. because they want to replicate into hundreds of shards depending on some key or such, the question I don't know is how to actually initiate the streaming. Somebody would need to start the logical decoding. Sounds like a job for a background worker. It would be pretty swell if you could write a background worker that connects to a logical replication slot and then does whatever. That's already possible. In that case you don't have to connect to a walsender, although doing so would give you some parallelism, one decoding the data, the other processing it ;). There's one usecase I do not forsee decoupling from the walsender interface this release though - synchronous logical replication. There currently is no code changes required to make sync rep work for this, and decoupling sync rep from walsender is too much to bite off in one go. for materialized views, we might like to push the changes into delta relations within the source database. Yes, that's not a bad usecase and I think the only thing missing to use output plugins that way is a convenient function to tell up to where data has been received (aka synced to disk, aka applied). Yes. It feels to me (and I only work here) like the job of the output plugin ought to be to put the data somewhere, and the replication code shouldn't make too many assumptions about where it's actually going. The output plugin just has two functions it calls to send out data, 'prepare_write' and 'write'. The callsite has to provide those callbacks. Two are included. walsender and an SQL SRF. Check the 'test_logical_decoding commit, it includes the SQL consumer. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6
On 09/24/2013 11:21 AM, Andres Freund wrote: Not having a consumer of the walsender interface included sounds like a bad idea to me, even if it were only useful for testing. Now, you could argue it should be in /contrib - and I wouldn't argue against that except it shares code with the rest of src/bin/pg_basebackup. +1 on pg_receivellog (or whatever better name we pick) being somewhere. I found the pg_receivellog code very useful as an example and for debugging/development purposes. It isn't something that I see useful for the average user and I think the use-cases it meets are closer to other things we usually put in /contrib Steve -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6
On 2013-09-20 14:15:23 -0700, Peter Geoghegan wrote: I have a little bit of feedback that I forgot to mention in my earlier reviews, because I thought it was too trivial then: something about the name pg_receivellog annoys me in a way that the name pg_receivexlog does not. Specifically, it looks like someone meant to type pg_receivelog but fat-fingered it. Yes, you're not the first to dislike it (including me). pg_receivelogical? Protest now or forever hold your peace. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6
On Mon, Sep 23, 2013 at 1:46 AM, Andres Freund and...@2ndquadrant.com wrote: pg_receivelogical? Protest now or forever hold your peace. I was thinking pg_receiveloglog, but that works just as well. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6
On 2013-09-23 13:47:05 -0300, Alvaro Herrera wrote: Andres Freund escribió: On 2013-09-20 14:15:23 -0700, Peter Geoghegan wrote: I have a little bit of feedback that I forgot to mention in my earlier reviews, because I thought it was too trivial then: something about the name pg_receivellog annoys me in a way that the name pg_receivexlog does not. Specifically, it looks like someone meant to type pg_receivelog but fat-fingered it. Yes, you're not the first to dislike it (including me). pg_receivelogical? Protest now or forever hold your peace. I had proposed pg_recvlogical I still find it wierd/inconsistent to have: * pg_receivexlog * pg_recvlogical binaries, even from the same source directory. Why once pg_recv and once pg_receive? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6
Andres Freund escribió: On 2013-09-20 14:15:23 -0700, Peter Geoghegan wrote: I have a little bit of feedback that I forgot to mention in my earlier reviews, because I thought it was too trivial then: something about the name pg_receivellog annoys me in a way that the name pg_receivexlog does not. Specifically, it looks like someone meant to type pg_receivelog but fat-fingered it. Yes, you're not the first to dislike it (including me). pg_receivelogical? Protest now or forever hold your peace. I had proposed pg_recvlogical -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6
On Mon, Sep 23, 2013 at 9:54 AM, Andres Freund and...@2ndquadrant.com wrote: I still find it wierd/inconsistent to have: * pg_receivexlog * pg_recvlogical binaries, even from the same source directory. Why once pg_recv and once pg_receive? +1 -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6
Andres Freund escribió: On 2013-09-23 13:47:05 -0300, Alvaro Herrera wrote: I had proposed pg_recvlogical I still find it wierd/inconsistent to have: * pg_receivexlog * pg_recvlogical binaries, even from the same source directory. Why once pg_recv and once pg_receive? Well. What are the principles we want to follow when choosing a name? Is consistency the first and foremost consideration? To me, that names are exactly consistent is not all that relevant; I prefer a shorter name if it embodies all it means. For that reason I didn't like the receiveloglog suggestion: it's not clear what are the two log bits. To me this suggests that logical should not be shortened. But the recv thing is clear to be receive, isn't it? Enough that it can be shortened without loss of meaning. If we consider consistency in naming of tools is uber-important, well, obviously my proposal is dead. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6
On 9/23/13 12:54 PM, Andres Freund wrote: I still find it wierd/inconsistent to have: * pg_receivexlog * pg_recvlogical binaries, even from the same source directory. Why once pg_recv and once pg_receive? It's consistent because they are the same length! (Obviously, this would severely restrict future tool naming.) In all seriousness, I like this naming best so far. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6
On Mon, Sep 23, 2013 at 1:11 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Andres Freund escribió: On 2013-09-23 13:47:05 -0300, Alvaro Herrera wrote: I had proposed pg_recvlogical I still find it wierd/inconsistent to have: * pg_receivexlog * pg_recvlogical binaries, even from the same source directory. Why once pg_recv and once pg_receive? Well. What are the principles we want to follow when choosing a name? Is consistency the first and foremost consideration? To me, that names are exactly consistent is not all that relevant; I prefer a shorter name if it embodies all it means. For that reason I didn't like the receiveloglog suggestion: it's not clear what are the two log bits. To me this suggests that logical should not be shortened. But the recv thing is clear to be receive, isn't it? Enough that it can be shortened without loss of meaning. If we consider consistency in naming of tools is uber-important, well, obviously my proposal is dead. What exactly is the purpose of this tool? My impression is that the output of logical replication is a series of function calls to a logical replication plugin, but does that plugin necessarily have to produce an output format that gets streamed to a client via a tool like this? For example, for replication, I'd think you might want the plugin to connect to a remote database and directly shove the data in; for materialized views, we might like to push the changes into delta relations within the source database. In either case, there's no particular need for any sort of client at all, and in fact it would be much better if none were required. The existence of a tool like pg_receivellog seems to presuppose that the goal is spit out logical change records as text, but I'm not sure that's actually going to be a very common thing to want to do... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6
On Mon, Sep 23, 2013 at 8:12 PM, Robert Haas robertmh...@gmail.com wrote: The existence of a tool like pg_receivellog seems to presuppose that the goal is spit out logical change records as text, but I'm not sure that's actually going to be a very common thing to want to do... Sure, but I think it's still worth having, for debugging purposes and so on. Perhaps the incorrect presupposition is that it deserves to live in /bin and not /contrib. Also, even though the tool is derivative of pg_receivexlog, its reason for existing is sufficiently different that maybe it deserves an entirely distinct name. On the other hand, precisely because it's derivative of receivelog/pg_receivexlog, it kind of makes sense to group them together like that. So I don't know. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6
On 09/20/2013 06:33 AM, Andres Freund wrote: Hi, The points I find daunting are the semantics, like: * How do we control whether a standby is allowed prevent WAL file removal. What if archiving is configured? * How do we control whether a standby is allowed to peg xmin? * How long do we peg an xmin/wal file removal if the standby is gone * How does the userinterface look to remove a slot if a standby is gone * How do we decide/control which commands use a slot in which cases? I think we are going to want to be flexible enough to support users with a couple of different points of use-cases. * Some people will want to keep xmin pegged and prevent WAL removal so a standby with a slot can always catch up, and wi * Most people will want to say keep X megabytes of WA (if needed by a behind slot) and keep xmin pegged so that the WAL can be consumed by a logical plugin. I can see us also implementing a restore_command that the walsender could use to get archived segments but for logical replication xmin would still need to be low enough I don't think the current patch set is incompatible with us later implementing any of the above. I'd rather see us focus on getting the core functionality committed and worry about a good interface for managing slots later. Greetings, Andres Freund Steve -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6
Hi, On 2013-09-19 12:05:35 -0400, Robert Haas wrote: No question. I'm not saying that that optimization shouldn't go in right after the main patch does, but IMHO right now there are too many things going in the 0004 patch to discuss them all simultaneously. I'd like to find a way of splitting this up that will let us block-and-tackle individual pieces of it, even we end up committing them all one right after the other. Fine with me. I was critized for splitting up stuff too much before ;) Expect a finer-grained series. But that raises an interesting question: why is the overhead so bad? I mean, this shouldn't be any worse than having a series of transactions running concurrently with pgbench that take a snapshot and hold it for as long as it takes the decoding process to decode the most-recently committed transaction. Pgbench really slows down scarily if there are some slightly longer running transactions around... Is the issue here that we can't advance xmin until we've fsync'd the fruits of decoding down to disk? Basically yes. We only advance the xmin of the slot so far that we could still build a valid snapshot to decode the first transaction not confirmed to have been synced to disk by the client. If so, that's mighty painful. But we'd really only need to hold back xmin in that situation when some catalog change has occurred meanwhile, which for pgbench means never. So something seems fishy here. It's less simple than that. We need to protect against concurrent DDL producing deleted rows that we will still need. We need HeapTupleStisfiesVacuum() to return HEAPTUPLE_RECENTLY_DEAD not HEAPTUPLE_DEAD for such rows, right? The way to do that is to guarantee that if TransactionIdDidCommit(xmax) is true, TransactionIdPrecedes(xmax, OldestXmin) is also true. So, we need to peg OldestXmin (as passed to HTSV) to the xid of the oldest transaction we're still decoding. I am not sure how you could do that iff somewhere in the future DDL has started since there's no interlock preventing anyone against doing so. - It still bothers me that we're going to have mandatory slots for logical replication and no slots for physical replication. If people think this needs to be a general facility from the start, I can be convinced that way, but I think there's so much to discuss around the semantics and different usecases that I'd much prefer to discuss that later. I'm worried that if we don't know how the physical replication slots are going to work, they'll end up being randomly different from the logical replication slots, and that'll be an API wart which we'll have a hard time getting rid of later. Hm. I actually think that minus some s/Logical//g and a mv won't be much need to change on the slot interface itself. What we need for physical rep is basically to a) store the position up to where the primary has fsynced the WAL b) store the xmin horizon the standby currently has. Sure, we can store more stats (most of pg_stat_replication, perhaps some more) but that's not functionally critical and not hard to extend. The points I find daunting are the semantics, like: * How do we control whether a standby is allowed prevent WAL file removal. What if archiving is configured? * How do we control whether a standby is allowed to peg xmin? * How long do we peg an xmin/wal file removal if the standby is gone * How does the userinterface look to remove a slot if a standby is gone * How do we decide/control which commands use a slot in which cases? - What is the purpose of (Un)SuspendDecodingSnapshots? It seems that should be explained somewhere. I have my doubts about how safe that is. I'll document the details if they aren't right now. Consider what happens if somebody does something like: VACUUM FULL pg_am;. If we were to build the relation descriptor of pg_am in an historical snapshot, as you coin it, we'd have the wrong filenode in there. And consequently any future lookups in pg_am will open a file that doesn't exist. That problem only exist for non-nailed relations that are accessed during decoding. But if it's some user table flagged with the terribly-named treat_as_catalog_table flag, then they could have not only changed the relfilenode but also the tupledesc. And then you can't just wave your hands at the problem. Heh. Well cought. There's a comment about that somewhere... Those are problematic, my plan so far is to throw my hands up and forbid alter tables that rewrite those. I know you don't like that flag and especially it's name. I am open to suggestions to a) rename it b) find a better solution. I am pretty sure a) is possible but I have severe doubts about any realistic b). Yes, I don't like it either. I am not sure what to replace it with though. It's easy enough to fit something in GetCatalogSnapshot() and I don't have a problem with that, but I am less happy with adding code like that to GetSnapshotData() for
Re: [HACKERS] logical changeset generation v6
On Thu, Sep 19, 2013 at 7:43 AM, Andres Freund and...@2ndquadrant.com wrote: More generally, the thing that bugs me about this approach is that logical replication is not really special, but what you've done here MAKES it special. There are plenty of other situations where we are too aggressive about holding back xmin. A single-statement transaction that selects from a single table holds back xmin for the entire cluster, and that is Very Bad. It would probably be unfair to say, well, you have to solve that problem first. But on the other hand, how do we know that the approach you've adopted here isn't going to make the more general problem harder to solve? It seems invasive at a pretty low level. I agree that it's invasive, but I am doubtful that pegging the xmin in a more granular fashion precludes this kind of optimization. We might have to generalize what Andres has done, which could mean eventually throwing it out and starting from scratch, but I have a hard time seeing how that implies an appreciable cost above solving the general problem first (now that Andres has already implemented the RecentGlobalDataXmin thing). As I'm sure you appreciate, the cost of doing the opposite - of solving the general problem first - may be huge: waiting another release for logical changeset generation. The reason why I think it's actually different is that the user actually has control over how long transactions are running on the primary. They don't really control how fast a replication consumer consumes and how often it sends feedback messages. Right. That's about what I said last year. I find the following analogy useful: A logical changeset generation implementation without RecentGlobalDataXmin is kind of like an old-fashioned nuclear reactor, like the one they had at Chernobyl. Engineers have to actively work in order to prevent it from overheating. However, an implementation with RecentGlobalDataXmin is like a modern, much safer nuclear reactor. Engineers have to actively work to keep the reactor heated. Which is to say, with RecentGlobalDataXmin a standby that dies cannot bloat the master too much (almost as with hot_standby_feedback - that too requires active participation from the standby to do harm to the master). Without RecentGlobalDataXmin, the core system and the plugin at the very least need to worry about that case when a standby dies. I have a little bit of feedback that I forgot to mention in my earlier reviews, because I thought it was too trivial then: something about the name pg_receivellog annoys me in a way that the name pg_receivexlog does not. Specifically, it looks like someone meant to type pg_receivelog but fat-fingered it. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6
On Tue, Sep 17, 2013 at 10:31 AM, Andres Freund and...@2ndquadrant.com wrote: Rebased patches attached. I spent a bit of time looking at these patches yesterday and today. It seems to me that there's a fair amount of stylistic cleanup that is still needed, and some pretty bad naming choices, and some FIXMEs that should probably be fixed, but for an initial review email it seems more helpful to focus on high-level design, so here goes. - Looking specifically at the 0004 patch, I think that the RecentGlobalDataXmin changes are logically separate from the rest of the patch, and that if we're going to commit them at all, it should be separate from the rest of this. I think this is basically a performance optimization. AFAICS, there's no correctness problem with the idea of continuing to maintain a single RecentGlobalXmin; it's just that you'll potentially end up with quite a lot of bloat. But that's not an argument that those two things HAVE to be committed together; either could be done first, and independently of the other. Also, these changes are quite complex and it's different to form a judgement as to whether this idea is safe when they are intermingled with the rest of the logical replication stuff. More generally, the thing that bugs me about this approach is that logical replication is not really special, but what you've done here MAKES it special. There are plenty of other situations where we are too aggressive about holding back xmin. A single-statement transaction that selects from a single table holds back xmin for the entire cluster, and that is Very Bad. It would probably be unfair to say, well, you have to solve that problem first. But on the other hand, how do we know that the approach you've adopted here isn't going to make the more general problem harder to solve? It seems invasive at a pretty low level. I think we should at least spend some time thinking about what *general* solutions to this problem would like like and then decide whether this is approach is likely to be forward-compatible with those solutions. - There are no changes to the doc directory. Obviously, if you're going to add a new value for the wal_level GUC, it's gonna need to be documented. Similarly, pg_receivellog needs to be documented. In all likelihood, you also need a whole chapter providing general background on this technology. A couple of README files is not going to do it, and those files aren't suitable for check-in anyway (e.g. DESIGN.txt makes reference to a URL where the current version of some patch can be found; that's not appropriate for permanent documentation). But aside from that, what we really need here is user documentation, not developer documentation. I can perhaps pass judgement on whether the guts of this functionality do things that are fundamentally unsafe, but whether the user interface is good or bad is a question that deserves broader input, and without documentation, most people aren't going to understand it well enough to know whether they like it. And TBH, *I* don't really want to reverse-engineer what pg_receivellog does from a read-through of the code, either. - Given that we don't reassemble transactions until commit time, why do we need to to ensure that XIDs are logged before their sub-XIDs appear in WAL? As I've said previously, I actually think that on-the-fly reassembly is probably going to turn out to be very important. But if we're not getting that, do we really need this? Also, while I'm trying to keep this email focused on high-level concerns, I have to say that guaranteedlyLogged has got to be one of the worst variable names I've ever seen, starting (but not ending) with the fact that guaranteedly is not a word. I'm also tempted to say that all of the wal_level=logical changes should be split out as their own patch, separate from the decoding stuff. Personally, I would find that much easier to review, although I admit it's less critical here than for the RecentGlobalDataXmin stuff. - If XLOG_HEAP_INPLACE is not decoded, doesn't that mean that this facility can't be used to replicate a system catalog table? Is that a restriction we should enforce/document somehow? - The new code is rather light on comments. decode.c is extremely light. For example, consider the function DecodeAbort(), which contains the following comment: + /* +* this is a bit grotty, but if we're faking an abort we've already gone +* through +*/ Well, I have no idea what that means. I'm sure you do, but I bet the next person who isn't you that tries to understand this probably won't. It's also probably true that I could figure it out if I spent more time on it, but I think the point of comments is to keep the amount of time that must be spent trying to understand code to a manageable level. Generally, I'd suggest that any non-trivial functions in these files should have a header comment explaining what their job is; e.g. for DecodeStandbyOp you
Re: [HACKERS] logical changeset generation v6
Hi Robert, On 2013-09-19 10:02:31 -0400, Robert Haas wrote: On Tue, Sep 17, 2013 at 10:31 AM, Andres Freund and...@2ndquadrant.com wrote: Rebased patches attached. I spent a bit of time looking at these patches yesterday and today. It seems to me that there's a fair amount of stylistic cleanup that is still needed, and some pretty bad naming choices, and some FIXMEs that should probably be fixed, but for an initial review email it seems more helpful to focus on high-level design, so here goes. Thanks for looking at it. Yes, I think the highlevel stuff is the important bit. As you note, the documentation needs to be written and that's certainly not a small task. But doing so before the highlevel design is agreed upon makes it too likely that it will need to be entirely scrapped. - Looking specifically at the 0004 patch, I think that the RecentGlobalDataXmin changes are logically separate from the rest of the patch, and that if we're going to commit them at all, it should be separate from the rest of this. I think this is basically a performance optimization. AFAICS, there's no correctness problem with the idea of continuing to maintain a single RecentGlobalXmin; it's just that you'll potentially end up with quite a lot of bloat. But that's not an argument that those two things HAVE to be committed together; either could be done first, and independently of the other. Also, these changes are quite complex and it's different to form a judgement as to whether this idea is safe when they are intermingled with the rest of the logical replication stuff. Up until v3 the RecentGlobalDataXmin stuff wasn't included and reviewers (primarily Peter G. on -hackers and Greg Stark at pgconf.eu) remarked on that and considered it critical. I argued for a considerable amount of time that it shouldn't be done in an initial patch and then gave in. They have a point though, if you e.g. replicate a pgbench -j16 workload the addition of RecentGlobalDataXmin reduces the performance impact of replication from about 60% to less than 5% in my measurements. Turns out heap pruning is damn critical for that kind of workload. More generally, the thing that bugs me about this approach is that logical replication is not really special, but what you've done here MAKES it special. There are plenty of other situations where we are too aggressive about holding back xmin. A single-statement transaction that selects from a single table holds back xmin for the entire cluster, and that is Very Bad. It would probably be unfair to say, well, you have to solve that problem first. But on the other hand, how do we know that the approach you've adopted here isn't going to make the more general problem harder to solve? It seems invasive at a pretty low level. The reason why I think it's actually different is that the user actually has control over how long transactions are running on the primary. They don't really control how fast a replication consumer consumes and how often it sends feedback messages. I think we should at least spend some time thinking about what *general* solutions to this problem would like like and then decide whether this is approach is likely to be forward-compatible with those solutions. I thought about the general case for a good bit and decided that all solutions that work in a more general scenario are complex enough that I don't want to implement them. And I don't really see any backward compatibility concerns here - removing the logic of using a separate horizon for user tables in contrast to system tables is pretty trivial and shouldn't have any external effect. Except pegging the horizon more, but that's what the new approach would fix, right? - Given that we don't reassemble transactions until commit time, why do we need to to ensure that XIDs are logged before their sub-XIDs appear in WAL? Currently it's important to know where the oldest transaction that is alive started at to determine from where we need to restart decoding. That's done by keeping a lsn-ordered list of in progress toplevel transactions. The above invariant makes it cheap to maintain that list. As I've said previously, I actually think that on-the-fly reassembly is probably going to turn out to be very important. But if we're not getting that, do we really need this? It's also preparatory for supporting that. I agree that it's pretty important, but after actually having implemented a replication solution using this, I still think that most usecase won't using it when available. I plan to work on implementing that. Also, while I'm trying to keep this email focused on high-level concerns, I have to say that guaranteedlyLogged has got to be one of the worst variable names I've ever seen, starting (but not ending) with the fact that guaranteedly is not a word. I'm also tempted to say that all of the wal_level=logical changes should be split out as their own patch, separate from the
Re: [HACKERS] logical changeset generation v6
On Thu, Sep 19, 2013 at 10:43 AM, Andres Freund and...@2ndquadrant.com wrote: - Looking specifically at the 0004 patch, I think that the RecentGlobalDataXmin changes are logically separate from the rest of the patch, and that if we're going to commit them at all, it should be separate from the rest of this. I think this is basically a performance optimization. AFAICS, there's no correctness problem with the idea of continuing to maintain a single RecentGlobalXmin; it's just that you'll potentially end up with quite a lot of bloat. But that's not an argument that those two things HAVE to be committed together; either could be done first, and independently of the other. Also, these changes are quite complex and it's different to form a judgement as to whether this idea is safe when they are intermingled with the rest of the logical replication stuff. Up until v3 the RecentGlobalDataXmin stuff wasn't included and reviewers (primarily Peter G. on -hackers and Greg Stark at pgconf.eu) remarked on that and considered it critical. I argued for a considerable amount of time that it shouldn't be done in an initial patch and then gave in. They have a point though, if you e.g. replicate a pgbench -j16 workload the addition of RecentGlobalDataXmin reduces the performance impact of replication from about 60% to less than 5% in my measurements. Turns out heap pruning is damn critical for that kind of workload. No question. I'm not saying that that optimization shouldn't go in right after the main patch does, but IMHO right now there are too many things going in the 0004 patch to discuss them all simultaneously. I'd like to find a way of splitting this up that will let us block-and-tackle individual pieces of it, even we end up committing them all one right after the other. But that raises an interesting question: why is the overhead so bad? I mean, this shouldn't be any worse than having a series of transactions running concurrently with pgbench that take a snapshot and hold it for as long as it takes the decoding process to decode the most-recently committed transaction. Is the issue here that we can't advance xmin until we've fsync'd the fruits of decoding down to disk? If so, that's mighty painful. But we'd really only need to hold back xmin in that situation when some catalog change has occurred meanwhile, which for pgbench means never. So something seems fishy here. I thought about the general case for a good bit and decided that all solutions that work in a more general scenario are complex enough that I don't want to implement them. And I don't really see any backward compatibility concerns here - removing the logic of using a separate horizon for user tables in contrast to system tables is pretty trivial and shouldn't have any external effect. Except pegging the horizon more, but that's what the new approach would fix, right? Hmm, maybe. Also, while I'm trying to keep this email focused on high-level concerns, I have to say that guaranteedlyLogged has got to be one of the worst variable names I've ever seen, starting (but not ending) with the fact that guaranteedly is not a word. I'm also tempted to say that all of the wal_level=logical changes should be split out as their own patch, separate from the decoding stuff. Personally, I would find that much easier to review, although I admit it's less critical here than for the RecentGlobalDataXmin stuff. I can do that again and it actually was that way in the past. But there's no user for it before the later patch and it's hard to understand the reasoning for the changed wal logging separately, that's why I merged it at some point. OK. If I'm committing it, I'd prefer to handle that piece separately, if possible. - If XLOG_HEAP_INPLACE is not decoded, doesn't that mean that this facility can't be used to replicate a system catalog table? Is that a restriction we should enforce/document somehow? Currently catalog tables aren't replicated, yes. They simply are skipped during decoding. XLOG_HEAP_INPLACE isn't the primary reason for that though. Do you see a usecase for it? I can imagine someone wanting to do it, but I think we can live with it not being supported. - It still bothers me that we're going to have mandatory slots for logical replication and no slots for physical replication. Why are slots mandatory in one case and not even allowable in the other? Is it sensible to think about slotless logical replication - or maybe I should say, is it any LESS sensible than slotless physical replication? Well, as you know, I do want to have slots for physical replication as well. But there actually is a fundamental difference why we need it for logical rep and not for physical: In physical replication, if the xmin progresses too far, client queries will be cancelled. Annoying but not fatal. In logical replication we will not be able to continue replicating since we cannot decode the WAL
Re: [HACKERS] logical changeset generation v6
On Tue, Sep 17, 2013 at 11:31 PM, Andres Freund and...@2ndquadrant.com wrote: On 2013-09-17 09:45:28 -0400, Peter Eisentraut wrote: On 9/15/13 11:30 AM, Andres Freund wrote: On 2013-09-15 11:20:20 -0400, Peter Eisentraut wrote: On Sat, 2013-09-14 at 22:49 +0200, Andres Freund wrote: Attached you can find the newest version of the logical changeset generation patchset. You probably have bigger things to worry about, but please check the results of cpluspluscheck, because some of the header files don't include header files they depend on. Hm. I tried to get that right, but it's been a while since I last checked. I don't regularly use cpluspluscheck because it doesn't work in VPATH builds... We really need to fix that. I'll push a fix for that to the git tree, don't think that's worth a resend in itself. This patch set now fails to apply because of the commit Rename various freeze multixact variables. And I am even partially guilty for that patch... Rebased patches attached. When I applied all the patches and do the compile, I got the following error: gcc -O0 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -g -I. -I../../../../src/include -D_GNU_SOURCE -c -o snapbuild.o snapbuild.c snapbuild.c:187: error: redefinition of typedef 'SnapBuild' ../../../../src/include/replication/snapbuild.h:45: note: previous declaration of 'SnapBuild' was here make[4]: *** [snapbuild.o] Error 1 When I applied only 0001-wal_decoding-Allow-walsender-s-to-connect-to-a-speci.patch, compiled the source, and set up the asynchronous replication, I got the segmentation fault. LOG: server process (PID 12777) was terminated by signal 11: Segmentation fault Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6
On 9/15/13 11:30 AM, Andres Freund wrote: On 2013-09-15 11:20:20 -0400, Peter Eisentraut wrote: On Sat, 2013-09-14 at 22:49 +0200, Andres Freund wrote: Attached you can find the newest version of the logical changeset generation patchset. You probably have bigger things to worry about, but please check the results of cpluspluscheck, because some of the header files don't include header files they depend on. Hm. I tried to get that right, but it's been a while since I last checked. I don't regularly use cpluspluscheck because it doesn't work in VPATH builds... We really need to fix that. I'll push a fix for that to the git tree, don't think that's worth a resend in itself. This patch set now fails to apply because of the commit Rename various freeze multixact variables. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6
What's with 0001-Improve-regression-test-for-8410.patch? Did you mean to include that? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6
On 2013-09-15 10:03:54 -0400, Peter Eisentraut wrote: What's with 0001-Improve-regression-test-for-8410.patch? Did you mean to include that? Gah, no. That's already committed and unrelated. Stupid wildcard. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6
On Sat, 2013-09-14 at 22:49 +0200, Andres Freund wrote: Attached you can find the newest version of the logical changeset generation patchset. You probably have bigger things to worry about, but please check the results of cpluspluscheck, because some of the header files don't include header files they depend on. (I guess that's really pgcompinclude's job to find out, but cpluspluscheck seems to be easier to use.) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6
On 2013-09-15 11:20:20 -0400, Peter Eisentraut wrote: On Sat, 2013-09-14 at 22:49 +0200, Andres Freund wrote: Attached you can find the newest version of the logical changeset generation patchset. You probably have bigger things to worry about, but please check the results of cpluspluscheck, because some of the header files don't include header files they depend on. Hm. I tried to get that right, but it's been a while since I last checked. I don't regularly use cpluspluscheck because it doesn't work in VPATH builds... We really need to fix that. I'll push a fix for that to the git tree, don't think that's worth a resend in itself. Thanks, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers