Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin

2016-03-29 Thread Craig Ringer
On 15 March 2016 at 04:48, Andres Freund  wrote:

> On 2016-01-31 05:09:33 +0800, Craig Ringer wrote:
> > On 29 January 2016 at 18:16, Andres Freund  wrote:
> >
> > > Hi,
> > >
> > > so, I'm reviewing the output of:
> > >
> >
> > Thankyou very much for the review.
>
> Afaics you've not posted an updated version of this? Any chance you
> could?
>
>
Here's v5.

It still needs json support to be removed and the plpgsql hooks replaced,
but the rest should be sorted out.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From e1fb318e4c25b93770c53425277f3efa83ccb618 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Wed, 23 Mar 2016 09:03:04 +0800
Subject: [PATCH] Timeline following for logical decoding

Allow logical slots to follow timeline switches

Make logical replication slots timeline-aware, so replay can
continue from a historical timeline onto the server's current
timeline.

This is required to make failover slots possible and may also
be used by extensions that CreateReplicationSlot on a standby
and replay from that slot once the replica is promoted.

This does NOT add support for replaying from a logical slot on
a standby or for syncing slots to replicas.
---
 src/backend/access/transam/xlogreader.c|  51 +++-
 src/backend/access/transam/xlogutils.c | 260 --
 src/backend/replication/logical/logicalfuncs.c |  33 ++-
 src/include/access/xlogreader.h|  37 ++-
 src/include/access/xlogutils.h |   2 +-
 src/test/modules/Makefile  |   1 +
 src/test/modules/test_slot_timelines/.gitignore|   3 +
 src/test/modules/test_slot_timelines/Makefile  |  22 ++
 src/test/modules/test_slot_timelines/README|  19 ++
 .../expected/load_extension.out|  19 ++
 .../test_slot_timelines/sql/load_extension.sql |   7 +
 .../test_slot_timelines--1.0.sql   |  16 ++
 .../test_slot_timelines/test_slot_timelines.c  | 133 +
 .../test_slot_timelines/test_slot_timelines.conf   |   2 +
 .../test_slot_timelines.control|   5 +
 src/test/recovery/Makefile |   2 +
 .../recovery/t/006_logical_decoding_timelines.pl   | 304 +
 17 files changed, 865 insertions(+), 51 deletions(-)
 create mode 100644 src/test/modules/test_slot_timelines/.gitignore
 create mode 100644 src/test/modules/test_slot_timelines/Makefile
 create mode 100644 src/test/modules/test_slot_timelines/README
 create mode 100644 src/test/modules/test_slot_timelines/expected/load_extension.out
 create mode 100644 src/test/modules/test_slot_timelines/sql/load_extension.sql
 create mode 100644 src/test/modules/test_slot_timelines/test_slot_timelines--1.0.sql
 create mode 100644 src/test/modules/test_slot_timelines/test_slot_timelines.c
 create mode 100644 src/test/modules/test_slot_timelines/test_slot_timelines.conf
 create mode 100644 src/test/modules/test_slot_timelines/test_slot_timelines.control
 create mode 100644 src/test/recovery/t/006_logical_decoding_timelines.pl

diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index fcb0872..b7d249c 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -10,6 +10,9 @@
  *
  * NOTES
  *		See xlogreader.h for more notes on this facility.
+ *
+ *		This file is compiled as both front-end and backend code, so it
+ *		may not use ereport, server-defined static variables, etc.
  *-
  */
 
@@ -116,6 +119,11 @@ XLogReaderAllocate(XLogPageReadCB pagereadfunc, void *private_data)
 		return NULL;
 	}
 
+#ifndef FRONTEND
+	/* Will be loaded on first read */
+	state->timelineHistory = NIL;
+#endif
+
 	return state;
 }
 
@@ -135,6 +143,10 @@ XLogReaderFree(XLogReaderState *state)
 	pfree(state->errormsg_buf);
 	if (state->readRecordBuf)
 		pfree(state->readRecordBuf);
+#ifndef FRONTEND
+	if (state->timelineHistory)
+		list_free_deep(state->timelineHistory);
+#endif
 	pfree(state->readBuf);
 	pfree(state);
 }
@@ -192,6 +204,10 @@ XLogReadRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg)
 {
 	XLogRecord *record;
 	XLogRecPtr	targetPagePtr;
+	/*
+	 * When validating headers we need to check the pre-link only if we're
+	 * reading sequentally; see ValidXLogRecordHeader.
+	 */
 	bool		randAccess = false;
 	uint32		len,
 total_len;
@@ -208,6 +224,7 @@ XLogReadRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg)
 
 	if (RecPtr == InvalidXLogRecPtr)
 	{
+		/* No explicit start point; read the record after the one we just read */
 		RecPtr = state->EndRecPtr;
 
 		if (state->ReadRecPtr == InvalidXLogRecPtr)
@@ -223,11 +240,13 @@ XLogReadRecord(XLogReaderState *state, XLogRecPtr RecPtr, char 

Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin

2016-03-14 Thread Craig Ringer
On 15 March 2016 at 04:48, Andres Freund  wrote:

> On 2016-01-31 05:09:33 +0800, Craig Ringer wrote:
> > On 29 January 2016 at 18:16, Andres Freund  wrote:
> >
> > > Hi,
> > >
> > > so, I'm reviewing the output of:
> > >
> >
> > Thankyou very much for the review.
>
> Afaics you've not posted an updated version of this? Any chance you
> could?
>

I'll try to get to it soon, yeah. I have been focusing on things that
cannot exist as extensions, especially timeline following for logical slots
and failover slots.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin

2016-03-14 Thread Andres Freund
On 2016-01-31 05:09:33 +0800, Craig Ringer wrote:
> On 29 January 2016 at 18:16, Andres Freund  wrote:
> 
> > Hi,
> >
> > so, I'm reviewing the output of:
> >
> 
> Thankyou very much for the review.

Afaics you've not posted an updated version of this? Any chance you
could?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin

2016-01-31 Thread Michael Paquier
On Thu, Jan 7, 2016 at 4:50 PM, Craig Ringer  wrote:
> On 7 January 2016 at 01:17, Peter Eisentraut  wrote:
>> On 12/22/15 4:55 AM, Craig Ringer wrote:
>> and we could probably go through them
>> one by one and ask, why do we need this bit?  So that kind of system
>> will be very hard to review as a standalone submission.
>
> Again, I disagree. I think you're looking at this way too narrowly.
>
> I find it quite funny, actually. Here we go and produce something that's a
> nice re-usable component that other people can use in their products and
> solutions ... and all anyone does is complain that the other part required
> to use it as a canned product isn't posted yet (though it is now). But with
> BDR all anyone ever does is complain that it's too tightly coupled to the
> needs of a single product and the features extracted from it, like
> replication origins, should be more generic and general purpose so other
> people can use them in their products too. Which is it going to be?

As far as I can see, this patch is leveraging the current
infrastructure in core, logical decoding to convert the data obtained
as a set JSON blobs via a custom protocol. Its maintenance load looks
minimal, that's at least a good thing.

> It would be helpful if you could take a step back and describe what *you*
> think logical replication for PostgreSQL should look like. You clearly have
> a picture in mind of what it should be, what requirements it satisfies, etc.
> If you're going to argue based on that it'd be very helpful to describe it.
> I might've missed some important points you've seen and you might've
> overlooked issues I've seen.

Personally, if I would put a limit of what should be in-core, or what
should be logical replication from the core prospective, that would be
just to give to potential consumers (understand plugins here) of this
binary data (be it pg_ddl_command or what the logical decoding context
offers) a way to access it and then to allow those plugins to change
this binary data into something that is suited to it, and have simple
tools and example to test those things without relying on anything
external. test_decoding and test_ddl_deparse cover that already. What
I find a bit disturbing regarding this patch is: why would a JSON
representation be able to cover all kinds of needs? Aren't other
replication solution going to have their own data format and their own
protocol with different requirements?

Considering that this module could have a happier life if managed independently.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin

2016-01-30 Thread Craig Ringer
On 29 January 2016 at 18:16, Andres Freund  wrote:

> Hi,
>
> so, I'm reviewing the output of:
>

Thankyou very much for the review.


> > + pglogical_output_plhooks \
>
> I'm doubtful we want these plhooks. You aren't allowed to access normal
> (non user catalog) tables in output plugins. That seems too much to
> expose to plpgsql function imo.
>

You're right. We've got no way to make sure the user sticks to things
that're reasonably safe.

The intent of that module was to allow people to write row and origin
filters in plpgsql, to serve as an example of how to implement hooks, and
to provide a tool usable in testing pglogical_output from pg_regress.

An example can be in C, since it's not safe to do it in plpgsql as you
noted. A few toy functions will be sufficient for test use.

As for allowing users to flexibly filter, I'm stating to think that hooks
in pglogical_output aren't really the best long term option. They're needed
for now, but for 9.7+ we should look at whether it's practical to separate
"what gets forwarded" policy from the mechanics of how it gets sent.
pglogical_output currently just farms part of the logical decoding hook out
to its own hooks, but it wouldn't have to do that if logical decoding let
you plug in policy on what you send separately to how you send it. Either
via catalogs or plugin functions separate to the output plugin.

(Kinda off topic, though, and I think we need the hooks for now, just not
the plpgsql implementation).



> > +++ b/contrib/pglogical_output/README.md
>
> I don't think we've markdown in postgres so far - so let's just keep the
> current content and remove the .md
>

I'm halfway through turning it all into SGML anyway. I just got sidetracked
by other work. I'd be just as happy to leave it as markdown but figured
SGML would likely be required.


>
> > + Table metadata header
> > +
> > +|===
> > +|*Message*|*Type/Size*|*Notes*
> > +
> > +|Message type|signed char|Literal ‘**R**’ (0x52)
> > +|flags|uint8| * 0-6: Reserved, client _must_ ERROR if set and not
> recognised.
> > +|relidentifier|uint32|Arbitrary relation id, unique for this upstream.
> In practice this will probably be the upstream table’s oid, but the
> downstream can’t assume anything.
> > +|nspnamelength|uint8|Length of namespace name
> > +|nspname|signed char[nspnamelength]|Relation namespace (null terminated)
> > +|relnamelength|uint8|Length of relation name
> > +|relname|char[relname]|Relation name (null terminated)
> > +|attrs block|signed char|Literal: ‘**A**’ (0x41)
> > +|natts|uint16|number of attributes
> > +|[fields]|[composite]|Sequence of ‘natts’ column metadata blocks, each
> of which begins with a column delimiter followed by zero or more column
> metadata blocks, each with the same column metadata block header.
>
> That's a fairly high overhead. Hm.
>

Yeah, and it shows in Oleksandr's measurements.  However, that's a metadata
message that is sent only pretty infrequently if you enable relation
metadata caching. Which is really necessary to get reasonable performance
on anything but the simplest workloads, and is only optional because it
makes it easier to write and test a client without it first.


> > +== JSON protocol
> > +
> > +If `proto_format` is set to `json` then the output plugin will emit JSON
> > +instead of the custom binary protocol. JSON support is intended mainly
> for
> > +debugging and diagnostics.
> > +
>
> I'm fairly strongly opposed to including two formats in one output
> plugin. I think the demand for being able to look into the binary
> protocol should instead be satisfied by having a function that "expands"
> the binary data returned into something easier to understand.
>

Per our discussion yesterday I think I agree with you on that now.

My thinking is that we should patch pg_recvlogical to be able to load a
decoder plugin. Then extract the protocol decoding support from pglogical
into a separately usable library that can be loaded by pg_recvlogical,
pglogical downstream, and by SQL-level debug/test helper functions.

pg_recvlogical won't be able to decode binary or internal format field
values, but you can simply not request that they be sent.


> > + case PARAM_BINARY_BASETYPES_MAJOR_VERSION:
> > + val = get_param_value(elem, false,
> OUTPUT_PARAM_TYPE_UINT32);
> > +
>  data->client_binary_basetypes_major_version = DatumGetUInt32(val);
> > + break;
>
> Why is the major version tied to basetypes (by name)? Seem more
> generally useful.
>

I found naming that param rather awkward.

The idea is that we can rely on the Pg major version only for types defined
in core.  It's mostly safe for built-in extensions in that few if any
people ever upgrade them, but it's not strictly correct even there. Most of
them (hstore, etc) don't expose their own versions so it's hard to know
what to do about them.

What I want(ed?) to do is let a downstream enumerate 

Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin

2016-01-29 Thread Andres Freund
Hi,

so, I'm reviewing the output of:
> git diff $(git merge-base upstream/master 
> 2ndq/dev/pglogical-output)..2ndq/dev/pglogical-output
> diff --git a/contrib/Makefile b/contrib/Makefile
> index bd251f6..028fd9a 100644
> --- a/contrib/Makefile
> +++ b/contrib/Makefile
> @@ -35,6 +35,8 @@ SUBDIRS = \
>   pg_stat_statements \
>   pg_trgm \
>   pgcrypto\
> + pglogical_output \
> + pglogical_output_plhooks \

I'm doubtful we want these plhooks. You aren't allowed to access normal
(non user catalog) tables in output plugins. That seems too much to
expose to plpgsql function imo.

> +++ b/contrib/pglogical_output/README.md

I don't think we've markdown in postgres so far - so let's just keep the
current content and remove the .md :P

> + Table metadata header
> +
> +|===
> +|*Message*|*Type/Size*|*Notes*
> +
> +|Message type|signed char|Literal ‘**R**’ (0x52)
> +|flags|uint8| * 0-6: Reserved, client _must_ ERROR if set and not recognised.
> +|relidentifier|uint32|Arbitrary relation id, unique for this upstream. In 
> practice this will probably be the upstream table’s oid, but the downstream 
> can’t assume anything.
> +|nspnamelength|uint8|Length of namespace name
> +|nspname|signed char[nspnamelength]|Relation namespace (null terminated)
> +|relnamelength|uint8|Length of relation name
> +|relname|char[relname]|Relation name (null terminated)
> +|attrs block|signed char|Literal: ‘**A**’ (0x41)
> +|natts|uint16|number of attributes
> +|[fields]|[composite]|Sequence of ‘natts’ column metadata blocks, each of 
> which begins with a column delimiter followed by zero or more column metadata 
> blocks, each with the same column metadata block header.

That's a fairly high overhead. Hm.


> +== JSON protocol
> +
> +If `proto_format` is set to `json` then the output plugin will emit JSON
> +instead of the custom binary protocol. JSON support is intended mainly for
> +debugging and diagnostics.
> +

I'm fairly strongly opposed to including two formats in one output
plugin. I think the demand for being able to look into the binary
protocol should instead be satisfied by having a function that "expands"
the binary data returned into something easier to understand.

> + * Copyright (c) 2012-2015, PostgreSQL Global Development Group

2016 ;)


> + case PARAM_BINARY_BASETYPES_MAJOR_VERSION:
> + val = get_param_value(elem, false, 
> OUTPUT_PARAM_TYPE_UINT32);
> + data->client_binary_basetypes_major_version = 
> DatumGetUInt32(val);
> + break;

Why is the major version tied to basetypes (by name)? Seem more
generally useful.


> + case PARAM_RELMETA_CACHE_SIZE:
> + val = get_param_value(elem, false, 
> OUTPUT_PARAM_TYPE_INT32);
> + data->client_relmeta_cache_size = 
> DatumGetInt32(val);
> + break;

I'm not convinced this a) should be optional b) should have a size
limit. Please argue for that choice. And how the client should e.g. know
about evictions in that cache.



> --- /dev/null
> +++ b/contrib/pglogical_output/pglogical_config.h
> @@ -0,0 +1,55 @@
> +#ifndef PG_LOGICAL_CONFIG_H
> +#define PG_LOGICAL_CONFIG_H
> +
> +#ifndef PG_VERSION_NUM
> +#error  must be included first
> +#endif

Huh?

> +#include "nodes/pg_list.h"
> +#include "pglogical_output.h"
> +
> +inline static bool
> +server_float4_byval(void)
> +{
> +#ifdef USE_FLOAT4_BYVAL
> + return true;
> +#else
> + return false;
> +#endif
> +}
> +
> +inline static bool
> +server_float8_byval(void)
> +{
> +#ifdef USE_FLOAT8_BYVAL
> + return true;
> +#else
> + return false;
> +#endif
> +}
> +
> +inline static bool
> +server_integer_datetimes(void)
> +{
> +#ifdef USE_INTEGER_DATETIMES
> + return true;
> +#else
> + return false;
> +#endif
> +}
> +
> +inline static bool
> +server_bigendian(void)
> +{
> +#ifdef WORDS_BIGENDIAN
> + return true;
> +#else
> + return false;
> +#endif
> +}

Not convinced these should exists, and even moreso exposed in a header.

> +/*
> + * Returns Oid of the hooks function specified in funcname.
> + *
> + * Error is thrown if function doesn't exist or doen't return correct 
> datatype
> + * or is volatile.
> + */
> +static Oid
> +get_hooks_function_oid(List *funcname)
> +{
> + Oid funcid;
> + Oid funcargtypes[1];
> +
> + funcargtypes[0] = INTERNALOID;
> +
> + /* find the the function */
> + funcid = LookupFuncName(funcname, 1, funcargtypes, false);
> +
> + /* Validate that the function returns void */
> + if (get_func_rettype(funcid) != VOIDOID)
> + {
> + ereport(ERROR,
> + (errcode(ERRCODE_WRONG_OBJECT_TYPE),
> +  errmsg("function %s must return void",
> +

Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin

2016-01-18 Thread Tomasz Rybak
W dniu 07.01.2016, czw o godzinie 15∶50 +0800, użytkownik Craig Ringer
napisał:
> On 7 January 2016 at 01:17, Peter Eisentraut  wrote:
> > On 12/22/15 4:55 AM, Craig Ringer wrote:
> > > I'm a touch frustrated by that, as a large part of the point of
> > > submitting the output plugin separately and in advance of the
> > downstream
> > > was to get attention for it separately, as its own entity. A lot
> > of
> > > effort has been put into making this usable for more than just a
> > data
> > > source for pglogical's replication tools.
> > 

Maybe chosen name was not the best one - I assumed from the very 
eginning that it's replication solution and not something separate.

> > I can't imagine that there is a lot of interest in a replication
> > tool
> > where you only get one side of it, no matter how well-designed or
> > general it is.
> Well, the other part was posted most of a week ago.
> 
> http://www.postgresql.org/message-id/5685bb86.5010...@2ndquadrant.com
> 
> ... but this isn't just about replication. At least, not just to
> another PostgreSQL instance. This plugin is designed to be general
> enough to use for replication to other DBMSes (via appropriate
> receivers), to replace trigger-based data collection in existing
> replication systems, for use in audit data collection, etc.
> 
> Want to get a stream of data out of PostgreSQL in a consistent,
> simple way, without having to add triggers or otherwise interfere
> with the origin database? That's the purpose of this plugin, and it
> doesn't care in the slightest what the receiver wants to do with that
> data. It's been designed to be usable separately from pglogical
> downstream and - before the Python tests were rejected in discussions
> on this list - was tested using a test suite completely separate to
> the pglogical downstream using psycopg2 to make sure no unintended
> interdependencies got introduced.
> 
> You can do way more than that with the output plugin but you have to
> write your own downstream/receiver for the desired purpose, since
> using a downstream based on bgworkers and SPI won't make any sense
> outside PostgreSQL.
> 

Put those 3 paragraphs into README.md - and this is not a joke.
This is very good rationale behind this plugin; for now README
starts with link to documentation describing logical decoding
and the second paragraph talks about replication.
So when replication (and only it) is in README, it should be
no wonder that people (only - or mostly) think about replication.

Maybe we should think about changing the name to something like
logical_decoder or logical_streamer, to divorce this plugin
from pglogical? Currently even name suggests tight coupling - and
in other way than it should be. pglogical depends on this plugin,
not the other way around.

> If you just want a canned product to use, see the pglogical post
> above for the downstream code.
> 
>  
> > Ultimately, what people will want to do with this is
> > replicate things, not muse about its design aspects. So if we're
> > going
> >  to ship a replication solution in PostgreSQL core, we should ship
> > all
> > the pieces that make the whole system work.
> I don't buy that argument. Doesn't that mean logical decoding
> shouldn't have been accepted? Or the initial patches for parallel
> query? Or any number of other things that're part of incremental
> development solutions?
> 
> (This also seems to contradict what you then argue below, that the
> proposed feature is too broad and does too much.)
> 
> I'd be happy to see both parts go in, but I'm frustrated that
> nobody's willing to see beyond "replicate from one Pg to another Pg"
> and see all the other things you can do. Want to replicate to Oracle
> / MS-SQL / etc? This will help a lot and solve a significant part of
> the problem for you. Want to stream data to append-only audit logs?
> Ditto. But nope, it's all about PostgreSQL to PostgreSQL.
> 
> Please try to look further into what client applications can do with
> this directly. I already know it meets the needs of the pglogical
> downstream. What I was hoping to achieve with posting the output
> plugin earlier was to get some thought going about what *else* it'd
> be good for.
> 
> Again: pglogical is posted now (it just took longer than expected to
> get ready) and I'll be happy to see both it and the output plugin
> included. I just urge people to look at the output plugin as more
> than a tightly coupled component of pglogical.
> 
> Maybe some quality name bikeshedding for the output plugin would help
> ;)
> 
> > Also, I think there are two kinds of general systems: common core,
> > and
> > all possible features.  A common core approach could probably be
> > made
> > acceptable with the argument that anyone will probably want to do
> > things
> > this way, so we might as well implement it once and give it to
> > people.
> That's what we're going for here. Extensible, something people can
> build on and use.
>  
> >  In a way, the 

Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin

2016-01-07 Thread Jarred Ward
I didn't receive a response on the bugs mailing list for the following bug,
so I was hoping we could triage to someone with more familiarity with
Postgres internals than I to fix.

This ticket seems like folks who are invested in logical decoding.

The attached script is a simple workload that logical decoding is unable to
decode. It causes an unrecoverable crash in the logical decoder with
'ERROR:  subxact logged without previous toplevel record'.

On Thu, Jan 7, 2016 at 12:44 AM, Craig Ringer  wrote:

>
> Here's v5 of the pglogical patch.
>
> Changes:
>
> * Implement relation metadata caching
> * Add the relmeta_cache_size parameter for cache control
> * Add an extension to get version information
> * Create the pglogical_output header directory on install
> * Restore 9.4 compatibility (it's small)
> * Allow row filter hooks to see details of the changed tuple
> * Remove forward_changesets from pglogical_output (use a hook if you want
> this functionality)
>
> I'm not sure if 9.4 compat will be desirable or not. It's handy to avoid
> needing a separate backported version, but also confusing to do a PGXS
> build within a 9.6 tree against 9.4...
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


bug.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin

2016-01-07 Thread Andres Freund
On 2016-01-07 09:28:29 -0800, Jarred Ward wrote:
> I didn't receive a response on the bugs mailing list for the following bug,
> so I was hoping we could triage to someone with more familiarity with
> Postgres internals than I to fix.

Please don't post to unrelated threads, that just confuses things.

Andres

PS: Yes, I do plan to look at that issue at some point.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin

2016-01-06 Thread Greg Stark
On Wed, Jan 6, 2016 at 5:17 PM, Peter Eisentraut  wrote:
> I can't imagine that there is a lot of interest in a replication tool
> where you only get one side of it, no matter how well-designed or
> general it is.

Well I do have another purpose in mind myself so I do appreciate it
being available now and separately.

However you also need to keep in mind that any of these other purposes
will be more or less equally large projects as logical replication.
There's no particular reason to expect one to be able to start up
today and provide feedback faster than the replication code that's
already been under development for ages. I haven't even started on my
pet project and probably won't until February. And I haven't even
thought through the details of it so I don't even know if it'll be a
matter of a few weeks or months or more.

The one project that does seem like it should be fairly fast to get
going and provide a relatively easy way to test the APIs separately
would be an auditing tool. I saw one go by but didn't look into
whether it used logical decoding or another mechanism. One based on
logical decoding does seem like it would let you verify that, for
example, the api gave the right information to filter effectively and
store meta information to index the audit records effectively.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin

2016-01-06 Thread Peter Eisentraut
On 12/22/15 4:55 AM, Craig Ringer wrote:
> I'm a touch frustrated by that, as a large part of the point of
> submitting the output plugin separately and in advance of the downstream
> was to get attention for it separately, as its own entity. A lot of
> effort has been put into making this usable for more than just a data
> source for pglogical's replication tools.

I can't imagine that there is a lot of interest in a replication tool
where you only get one side of it, no matter how well-designed or
general it is.  Ultimately, what people will want to do with this is
replicate things, not muse about its design aspects.  So if we're going
to ship a replication solution in PostgreSQL core, we should ship all
the pieces that make the whole system work.

Also, I think there are two kinds of general systems: common core, and
all possible features.  A common core approach could probably be made
acceptable with the argument that anyone will probably want to do things
this way, so we might as well implement it once and give it to people.
In a way, the logical decoding interface is the common core, as we
currently understand it.  But this submission clearly has a lot of
features beyond just the basics, and we could probably go through them
one by one and ask, why do we need this bit?  So that kind of system
will be very hard to review as a standalone submission.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin

2016-01-06 Thread Craig Ringer
On 7 January 2016 at 01:17, Peter Eisentraut  wrote:

> On 12/22/15 4:55 AM, Craig Ringer wrote:
> > I'm a touch frustrated by that, as a large part of the point of
> > submitting the output plugin separately and in advance of the downstream
> > was to get attention for it separately, as its own entity. A lot of
> > effort has been put into making this usable for more than just a data
> > source for pglogical's replication tools.
>
> I can't imagine that there is a lot of interest in a replication tool
> where you only get one side of it, no matter how well-designed or
> general it is.


Well, the other part was posted most of a week ago.

http://www.postgresql.org/message-id/5685bb86.5010...@2ndquadrant.com

... but this isn't just about replication. At least, not just to another
PostgreSQL instance. This plugin is designed to be general enough to use
for replication to other DBMSes (via appropriate receivers), to replace
trigger-based data collection in existing replication systems, for use in
audit data collection, etc.

Want to get a stream of data out of PostgreSQL in a consistent, simple way,
without having to add triggers or otherwise interfere with the origin
database? That's the purpose of this plugin, and it doesn't care in the
slightest what the receiver wants to do with that data. It's been designed
to be usable separately from pglogical downstream and - before the Python
tests were rejected in discussions on this list - was tested using a test
suite completely separate to the pglogical downstream using psycopg2 to
make sure no unintended interdependencies got introduced.

You can do way more than that with the output plugin but you have to write
your own downstream/receiver for the desired purpose, since using a
downstream based on bgworkers and SPI won't make any sense outside
PostgreSQL.

If you just want a canned product to use, see the pglogical post above for
the downstream code.



> Ultimately, what people will want to do with this is
> replicate things, not muse about its design aspects. So if we're going

to ship a replication solution in PostgreSQL core, we should ship all
> the pieces that make the whole system work.
>

I don't buy that argument. Doesn't that mean logical decoding shouldn't
have been accepted? Or the initial patches for parallel query? Or any
number of other things that're part of incremental development solutions?

(This also seems to contradict what you then argue below, that the proposed
feature is too broad and does too much.)

I'd be happy to see both parts go in, but I'm frustrated that nobody's
willing to see beyond "replicate from one Pg to another Pg" and see all the
other things you can do. Want to replicate to Oracle / MS-SQL / etc? This
will help a lot and solve a significant part of the problem for you. Want
to stream data to append-only audit logs? Ditto. But nope, it's all about
PostgreSQL to PostgreSQL.

Please try to look further into what client applications can do with this
directly. I already know it meets the needs of the pglogical downstream.
What I was hoping to achieve with posting the output plugin earlier was to
get some thought going about what *else* it'd be good for.

Again: pglogical is posted now (it just took longer than expected to get
ready) and I'll be happy to see both it and the output plugin included. I
just urge people to look at the output plugin as more than a tightly
coupled component of pglogical.

Maybe some quality name bikeshedding for the output plugin would help ;)

Also, I think there are two kinds of general systems: common core, and
> all possible features.  A common core approach could probably be made
> acceptable with the argument that anyone will probably want to do things
> this way, so we might as well implement it once and give it to people.
>

That's what we're going for here. Extensible, something people can build on
and use.


> In a way, the logical decoding interface is the common core, as we
> currently understand it.  But this submission clearly has a lot of
> features beyond just the basics


Really? What would you cut? What's beyond the basics here? What basics are
you thinking of, i.e what set of requirements are you working towards /
needs are you seeking to meet?

We cut this to the bone to produce a minimum viable logical replication
solution. Especially the output plugin.

Cut the hook interfaces for row and xact filtering? You lose the ability to
use replication origins, crippling functionality, and for no real gain in
simplicity.

Remove JSON support? That's what most people are actually likely to want to
use when using the output plugin directly, and it's important for
debugging/tracing/diagnostics. It's a separate feature, to be sure, but
it's also a pretty trivial addition.


> and we could probably go through them
> one by one and ask, why do we need this bit?  So that kind of system
> will be very hard to review as a standalone submission.
>

Again, I disagree. I think 

Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin

2015-12-22 Thread Craig Ringer
On 22 December 2015 at 15:17, Michael Paquier 
wrote:

> On Mon, Dec 7, 2015 at 12:10 PM, Craig Ringer 
> wrote:
> > Removed, change pushed.
> >
> > Also pushed a change to expose the decoded row data to row filter hooks.
> >
> > I won't cut a v4 for this, as I'm working on merging the SGML-ified docs
> and
> > will do a v4 with that and the above readme change once that's done.
>
> Patch is moved to next CF, you seem to be still working on it.


Thanks.

Other than SGML-ified docs it's ready. The docs are a hard pre-req of
course. In any case most people appear to be waiting for the downstream
before looking at it at all, so bumping it makes sense.

I'm a touch frustrated by that, as a large part of the point of submitting
the output plugin separately and in advance of the downstream was to get
attention for it separately, as its own entity. A lot of effort has been
put into making this usable for more than just a data source for
pglogical's replication tools. In retrospect naming it pglogical_output was
probably unwise.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin

2015-12-22 Thread Robert Haas
On Tue, Dec 22, 2015 at 4:55 AM, Craig Ringer  wrote:
> I'm a touch frustrated by that, as a large part of the point of submitting
> the output plugin separately and in advance of the downstream was to get
> attention for it separately, as its own entity. A lot of effort has been put
> into making this usable for more than just a data source for pglogical's
> replication tools. In retrospect naming it pglogical_output was probably
> unwise.

It's not too late to rename it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin

2015-12-21 Thread Michael Paquier
On Mon, Dec 7, 2015 at 12:10 PM, Craig Ringer  wrote:
> Removed, change pushed.
>
> Also pushed a change to expose the decoded row data to row filter hooks.
>
> I won't cut a v4 for this, as I'm working on merging the SGML-ified docs and
> will do a v4 with that and the above readme change once that's done.

Patch is moved to next CF, you seem to be still working on it..
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin

2015-12-06 Thread Craig Ringer
On 2 December 2015 at 18:38, Petr Jelinek  wrote:

> First, I wonder if it would be useful to mention somewhere, even if it's
> only here in the mailing list how can the protocol be extended in
> non-breaking way in future for transaction streaming if we ever get that.


Good point.

I'll address that in the DESIGN.md in the next rev.

Separately, it's looking like xact streaming is possibly more complex than
I hoped due to cache invalidation issues, but I haven't been able to fully
understand the problem yet.

The other thing is that I think we don't need the "forward_changesets"
> parameter which currently decides if to forward changes that didn't
> originate on local node. There is already hook for origin filtering which
> provides same functionality in more flexible way so it seems redundant to
> also have special boolean option for it.


Removed, change pushed.

Also pushed a change to expose the decoded row data to row filter hooks.

I won't cut a v4 for this, as I'm working on merging the SGML-ified docs
and will do a v4 with that and the above readme change once that's done.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin

2015-12-02 Thread Petr Jelinek

Hi,

I can't really do huge review considering I wrote half of the code, but 
I have couple of things I noticed.


First, I wonder if it would be useful to mention somewhere, even if it's 
only here in the mailing list how can the protocol be extended in 
non-breaking way in future for transaction streaming if we ever get 
that. I am mainly asking for this because the protocol does not 
currently send xid for every change as it's not necessary, but for 
streaming it will be. I know of couple of ways myself but I think they 
should be described publicly.


The other thing is that I think we don't need the "forward_changesets" 
parameter which currently decides if to forward changes that didn't 
originate on local node. There is already hook for origin filtering 
which provides same functionality in more flexible way so it seems 
redundant to also have special boolean option for it.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin

2015-11-18 Thread Tomasz Rybak
W dniu 12.11.2015, czw o godzinie 22∶23 +0800, użytkownik Craig Ringer
napisał:
> Hi all
> 
> Here's an updated pglogical_output patch.
> 
> Selected changes since v1:
> 
>     - add json protocol output support
>     - fix encoding comparisons to use parsed encoding not string name
>     - import protocol documentation
>     - significantly expand pg_regress tests
>     - move pglogical_output_plhooks to a top-level contrib
>     - remove 9.4 compatibility

I've just skimmed through the patch.
As you removed 9.4 compatibility - are mentions of 9.4 and 9.5 
compatibility needed in README.md and README.plhooks?
It's not much text, but I'm not sure whether they shouldn't be
removed for 9.6-targeting change.

-- 
Tomasz Rybak  GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A  488E C654 FB33 2AD5 9860
http://member.acm.org/~tomaszrybak




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin

2015-11-15 Thread Craig Ringer
On 16 November 2015 at 09:57, Peter Eisentraut  wrote:

> On 11/2/15 7:17 AM, Craig Ringer wrote:
> > The output plugin is suitable for a number of uses. It's designed
> > primarily to supply a data stream to drive a logical replication
> > client running in another PostgreSQL instance, like the pglogical
> > client discussed at PGConf.EU 2015.
>
> So where is that client?
>

Not finished baking yet - in particular, the catalogs and UI are still in
flux. The time scale for getting that out is in the order of a few weeks.

The output plugin stands alone to a fair degree, especially with the json
output support. Comments would be greatly appreciated, especially from
people who're involved in replication, are currently using triggers to feed
data to external systems, etc.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin

2015-11-15 Thread Peter Eisentraut
On 11/2/15 7:17 AM, Craig Ringer wrote:
> The output plugin is suitable for a number of uses. It's designed
> primarily to supply a data stream to drive a logical replication
> client running in another PostgreSQL instance, like the pglogical
> client discussed at PGConf.EU 2015.

So where is that client?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin

2015-11-12 Thread Amit Langote

On 2015/11/02 23:36, Craig Ringer wrote:
> On 2 November 2015 at 20:17, Craig Ringer  wrote:
>> Hi all
>>
>> I'd like to submit pglogical_output for inclusion in the 9.6 series as
>> a contrib.
> 
> Here's the protocol documentation discussed in the README. It's
> asciidoc at the moment, so it can be formatted into something with
> readable tables.

Kudos! From someone who doesn't really read wire protocol docs a lot, this
was such an enlightening read.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin

2015-11-11 Thread Craig Ringer
On 3 November 2015 at 02:58, Jim Nasby  wrote:

> On 11/2/15 8:36 AM, Craig Ringer wrote:
>
>> Here's the protocol documentation discussed in the README. It's
>> asciidoc at the moment, so it can be formatted into something with
>> readable tables.
>>
>
> Is this by chance up on github? It'd be easier to read the final output
> there than the raw asciidoctor. ;)
>

HTML for the protocol documentation attached.

The docs are being converted to SGML at the moment.

I'll be pushing an update of pglogical_output soon. Patch in a followup
mail.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
<<< text/html; charset=UTF-8; name="protocol.html": Unrecognized >>>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin

2015-11-03 Thread Craig Ringer
On 3 November 2015 at 16:41, Craig Ringer  wrote:
> On 3 November 2015 at 02:58, Jim Nasby  wrote:
>> On 11/2/15 8:36 AM, Craig Ringer wrote:
>>>
>>> Here's the protocol documentation discussed in the README. It's
>>> asciidoc at the moment, so it can be formatted into something with
>>> readable tables.
>>
>>
>> Is this by chance up on github? It'd be easier to read the final output
>> there than the raw asciidoctor. ;)
>
> Not yet, no. Should be able to format it to PDF, HTML, etc if needed though.

In fact, I'll just put a PDF up shortly.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin

2015-11-03 Thread Craig Ringer
On 3 November 2015 at 02:58, Jim Nasby  wrote:
> On 11/2/15 8:36 AM, Craig Ringer wrote:
>>
>> Here's the protocol documentation discussed in the README. It's
>> asciidoc at the moment, so it can be formatted into something with
>> readable tables.
>
>
> Is this by chance up on github? It'd be easier to read the final output
> there than the raw asciidoctor. ;)

Not yet, no. Should be able to format it to PDF, HTML, etc if needed though.

Depending on the consensus here, I'm expecting the protocol docs will
likely get turned into a plaintext formatted README in the source
tree, or into SGML docs.

The rest are in rather more readable Markdown form at this point,
again pending opinions on where they should live - in the public SGML
docs or in-tree READMEs.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin

2015-11-02 Thread Shulgin, Oleksandr
On Mon, Nov 2, 2015 at 1:17 PM, Craig Ringer  wrote:

> Hi all
>
> I'd like to submit pglogical_output for inclusion in the 9.6 series as
> a contrib.


Yay, that looks pretty advanced! :-)

Still digesting...

--
Alex


Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin

2015-11-02 Thread Craig Ringer
On 2 November 2015 at 20:17, Craig Ringer  wrote:
> Hi all
>
> I'd like to submit pglogical_output for inclusion in the 9.6 series as
> a contrib.

Here's the protocol documentation discussed in the README. It's
asciidoc at the moment, so it can be formatted into something with
readable tables.

If everyone thinks it's reasonable to document the pglogical protocol
as part of the SGML docs then it can be converted. Since the walsender
protocol is documented in the public SGML docs it probably should be,
it's just a matter of deciding what goes where.

Thanks to Darko Milojković for the asciidoc conversion. All errors are
likely to be my edits.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
= Pg_logical protocol

This isn’t a whole new top-level TCP protocol - it uses the libpq and walsender 
protocols, rather than replacing them. It’s is a protocol within a protocol 
within a protocol.

This protocol is an inner layer in a stack:

 * tcp or unix sockets
 ** libpq protocol
 *** walsender COPY BOTH mode
  pg_logical output plugin => consumer protocol

This protocol has evolved out of the BDR protocol. For why changes were needed, 
see the document ++_Development plan_++ and ++_Rationale 
for protocol changes_++.

== ToC


== Notes for understanding the protocol documentation


When reading the protocol docs, note that:

 * The walsender IDENTIFY SYSTEM message exposes upstream’s:
 ** sysid
 ** tli (Timeline Identifier)
 ** xlogpos (LSN, or Log Sequence Number)
 ** logptr
 ** dbname (not db oid)
 * We can accept an arbitrary list of params to START LOGICAL REPLICATION. 
After that, the only information we can send from downstream to upstream is 
replay progress feedback messages.
== 
== 
== Protocol messages

The individual protocol messages are discussed in the following sub sections. 
Protocol flow and logic comes in the next major section.

Absolutely all top-level protocol messages begin with a message type byte. 
While represented in code as a character, this is a signed byte with no 
associated encoding.

Since the PostgreSQL libpq COPY protocol supplies a message length there’s no 
need for top-level protocol messages to embed a length in their header.
=== 
=== BEGIN message

A stream of rows starts with a BEGIN message. Rows may only be sent after a 
BEGIN and before a COMMIT.

|===
|*Message*|*Type/Size*|*Notes*

|Message type|signed char|Literal ‘**B**’ (0x42)
|flags|uint8| * 0-3: Reserved, client _must_ ERROR if set and not recognised.
|lsn|uint64|“final_lsn” in decoding context - currently it means lsn of commit
|commit time|uint64|“commit_time” in decoding context
|remote XID|uint32|“xid” in decoding context
|===

=== 
=== 
=== Forwarded transaction origin message

Sent if and only if the immediately prior message was a _BEGIN_ message with 
the FORWARDED_TRANSACTION flag set, the message after the BEGIN _must_ be a 
_forwarded transaction origin _message indicating what upstream host the 
transaction came from.

It is a protocol error to send/receive a forwarded transaction origin message 
at any other time.

The origin identifier is of arbitrary and application-defined format. 
Applications _should_ prefix their origin identifier with a fixed application 
name part, like _bdr__, _myapp__, etc. It is application-defined what an 
application does with forwarded transactions from other applications.

The origin identifier is typically closely related to replication slot names 
and replication origins’ names in an application system.

For more detail see _Changeset Forwarding_ in the README.


|===
|*Message*|*Type/Size*|*Notes*

|Message type|signed char|Literal ‘**O**’ (0x4f)
|flags|uint8| * 0-3: Reserved, application _must_ ERROR if set and not 
recognised
|origin_lsn|uint64|Log sequence number (LSN, XLogRecPtr) of the transaction’s 
commit record on its origin node (as opposed to the forwarding node’s commit 
LSN, which is ‘lsn’ in the BEGIN message)
|origin_identifier_length|uint8|Length in bytes of origin_identifier
|origin_identifier|signed char[origin_identifier_length]|An origin identifier 
of arbitrary, upstream-application-defined structure. _Should_ be text in the 
same encoding as the upstream database. NULL-terminated. _Should_ be 7-bit 
ASCII.
|===

== 
=== COMMIT message
A stream of rows ends with a COMMIT message.

There is no ROLLBACK message because aborted transactions are not sent by the 
upstream.


|===
|*Message*|*Type/Size*|*Notes*

|Message type|signed char|Literal ‘**C**’ (0x43)
|Flags|uint8| * 0-3: Reserved, client _must_ ERROR if set and not recognised
|Commit LSN|uint64|commit_lsn in decoding commit decode callback. This is the 
same value as in the BEGIN message, and marks the end of the transaction.
|End LSN|uint64|end_lsn in decoding transaction context
|Commit time|uint64|commit_time in decoding transaction context
|===


=== INSERT, UPDATE or 

Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin

2015-11-02 Thread Craig Ringer
On 2 November 2015 at 20:17, Craig Ringer  wrote:
> Hi all
>
> I'd like to submit pglogical_output for inclusion in the 9.6 series as
> a contrib.

A few points are likely to come up in anything but the most cursory
examination of the patch.

The README alludes to protocol docs that aren't in the tree. A
followup will add them shortly, they just need a few tweaks.

There are pg_regress tests, but they're limited. The output plugin
uses the binary output mode, and pg_regress doesn't play well with
that at all. Timestamps, XIDs, LSNs, etc are embedded in the output.
Also, pglogical its self emits LSNs and timestamps in commit messages.
Some things, like the startup message, are likely to contain variable
data in future too. So we can't easily do a "dumb" comparison of
expected output.

That's why the bulk of the tests in test/ are in Python, using
psycopg2. Python and psycopg2 were used partly because of the
excellent work done by Oleksandr Shulgin at Zalando
(https://github.com/zalando/psycopg2/tree/feature/replication-protocol,
https://github.com/psycopg/psycopg2/pull/322) which means we can
connect to the walsender and consume the replication protocol, rather
than relying only on the SQL interfaces. Both are supported, and only
the SQL interface is used by default. It also means the tests can have
logic to validate the protocol stream, examining it message by message
to ensure it's exactly what's expected. Rather than a diff where two
lines of binary gibberish don't match, you get a specific error. Of
course, I'm aware that the buildfarm animals aren't required to have
Python, let alone a patched psycopg2, so we can't rely on these as
smoketests. That's why the pg_regress tests are there too.

There another extension inside it, in
contrib/pglogical_output/examples/hooks . I'm not sure if this should
be separated out into a separate contrib/ since it's very tightly
coupled to pglogical_output. Its purpose is to expose the hooks from
pglogical_output to SQL, so that they can be implemented by plpgsql or
whatever, instead of having to be C functions. It's not integrated
into pglogical_output proper because I see this as mainly a test and
prototyping facility. It's necessary to have this in order for the
unit tests to cover filtering and hooks, but most practical users will
never want or need it. So I'd rather not integrate it into
pglogical_output proper.

pglogical_output has headers, and it installs them into Pg's include/
tree at install time. This is not something that prior contribs have
done, so there's no policy for it as yet. The reason for doing so is
that the output plugin exposes a hooks API so that it can be reused by
different clients with different needs, rather than being tightly
coupled to just one downstream user. For example, it makes no
assumptions about things like what replication origin names mean -
keeping with the design of replication origins, which just provide
mechanism without policy. That means that the client needs to tell the
output plugin how to filter transactions if it wants to do selective
replication on a node-by-node basis. Similarly, there's no built-in
support for selective replication on a per-table basis, just a hook
you can implement. So clients can provide their own policy for how to
decide what tables to replicate. When we're calling hooks for each and
every row we really want a C function pointer so we can avoid the need
to go through the fmgr each time, and so we can pass a `struct
Relation` and bypass the need for catalog lookups. That sort of thing.

Table metadata is sent for each row. It really needs to be sent once
for each consecutive series of rows for the same table. Some care is
required to make sure it's invalidated and re-sent when the table
structure changes mid-series. So that's a pending change. It's
important for efficiency, but pretty isolated and doesn't make the
plugin less useful otherwise, so I thought it could wait.

Sending the whole old tuple is not yet supported, per the fixme in
pglogical_write_update . It should really be a TODO, since to support
this we really need a way to keep track of replica identity for a
table, but also WAL-log the whole old tuple. (ab)using REPLICA
IDENTITY FULL to log the old tuple means we lose information about
what the real identity key is. So this is more of a wanted future
feature, and I'll change it to a TODO.

I'd like to delay some ERROR messages until after the startup
parameters are sent. That way the client can see more info about the
server's configuration, version, capabilities, etc, and possibly
reconnect with acceptable settings. Because a logical decoding plugin
isn't allowed to generate input during its startup callback, though,
this could mean indefinitely delaying an error until the upstream does
some work that results in a decoding callback. So for now errors on
protocol mismatches, etc, are sent immediately.

Text encoding names are compared byte-wise. They should be 

Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin

2015-11-02 Thread Jim Nasby

On 11/2/15 8:36 AM, Craig Ringer wrote:

Here's the protocol documentation discussed in the README. It's
asciidoc at the moment, so it can be formatted into something with
readable tables.


Is this by chance up on github? It'd be easier to read the final output 
there than the raw asciidoctor. ;)


Great work on this!
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin

2015-11-02 Thread Andres Freund
Hi,

On 2015-11-02 20:17:21 +0800, Craig Ringer wrote:
> I'd like to submit pglogical_output for inclusion in the 9.6 series as
> a contrib.

Cool!

> See the README.md and DESIGN.md in the attached patch for details on
> the plugin. I will follow up with a summary in a separate mail, along
> with a few points I'd value input on or want to focus discussion on.

Sounds to me like at least a portion of this should be in sgml, either
in a separate contrib page or in the logical decoding section.

A quick readthrough didn't have a separate description of the
"sub-protocol" in which changes and such are encoded - I think we're
going to need that.

> I anticipate that I'll be following up with a few tweaks, but at this
> point the plugin is feature-complete, documented and substantially
> ready for inclusion as a contrib.

There's a bunch of changes that are hinted at in the files in various
places. Could you extract the ones you think need to be fixed before
integration see in some central place (or just an email)?

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin

2015-11-02 Thread Craig Ringer
On 2 November 2015 at 20:35, Andres Freund  wrote:
> On 2015-11-02 20:17:21 +0800, Craig Ringer wrote:
>
>> See the README.md and DESIGN.md in the attached patch for details on
>> the plugin. I will follow up with a summary in a separate mail, along
>> with a few points I'd value input on or want to focus discussion on.
>
> Sounds to me like at least a portion of this should be in sgml, either
> in a separate contrib page or in the logical decoding section.

Yes, I think so. Before rewriting to SGML I wanted to solicit opinions
on what should be hidden away in a for-developers README file and what
parts deserve exposure in the public docs.

> A quick readthrough didn't have a separate description of the
> "sub-protocol" in which changes and such are encoded - I think we're
> going to need that.

It didn't quite make the first cut as I have to make a couple of edits
to reflect late changes. I should be able to follow up with that later
today.

The protocol design documentation actually predates the plugin its
self, though it saw a few changes where it became clear something
wouldn't work as envisioned. It's been quite a pleasure starting with
a detailed design, then implementing it.

> There's a bunch of changes that are hinted at in the files in various
> places. Could you extract the ones you think need to be fixed before
> integration see in some central place (or just an email)?

Yep, writing that up at the moment. I didn't want to make the initial
post too verbose.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers