Re: [HACKERS] REPLICA IDENTITY FULL

2017-06-24 Thread Tatsuo Ishii
> Thanks for the feedback.  I have committed it after removing the
> datumIsEqual() call.

Thanks for the patch! I confirmed my examples now work.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REPLICA IDENTITY FULL

2017-06-23 Thread Peter Eisentraut
On 6/23/17 13:14, Alvaro Herrera wrote:
> Andres Freund wrote:
>> On 2017-06-23 13:05:21 -0400, Alvaro Herrera wrote:
>>> Tom Lane wrote:
 Peter Eisentraut  writes:
> Any thoughts about keeping datumAsEqual() as a first check?  I did some
> light performance tests, but it was inconclusive.

 Seems like it would tend to be a win if, in fact, the values are
 usually equal.  If they're usually not, then it's a loser.  Do
 we have any feeling for which case is more common?
>>
>> Seems like a premature optimization to me - if you care about
>> performance and do this frequently, you're not going to end up using
>> FULL.  If we want to performance optimize, it'd probably better to
>> lookup candidate keys and use those if available.
> 
> I can get behind that argument.

Thanks for the feedback.  I have committed it after removing the
datumIsEqual() call.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REPLICA IDENTITY FULL

2017-06-23 Thread Alvaro Herrera
Andres Freund wrote:
> On 2017-06-23 13:05:21 -0400, Alvaro Herrera wrote:
> > Tom Lane wrote:
> > > Peter Eisentraut  writes:
> > > > Any thoughts about keeping datumAsEqual() as a first check?  I did some
> > > > light performance tests, but it was inconclusive.
> > > 
> > > Seems like it would tend to be a win if, in fact, the values are
> > > usually equal.  If they're usually not, then it's a loser.  Do
> > > we have any feeling for which case is more common?
> 
> Seems like a premature optimization to me - if you care about
> performance and do this frequently, you're not going to end up using
> FULL.  If we want to performance optimize, it'd probably better to
> lookup candidate keys and use those if available.

I can get behind that argument.

> > who in their right minds would use floating point columns as part of
> > replica identity ...?
> 
> Since this is FULL, it'll be all columns...

Yeah, I was thinking you shouldn't have floating point columns if you're
going to use FULL as identity.  But you're tautologically right: doing
the wrong thing is likely not the right thing to do.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REPLICA IDENTITY FULL

2017-06-23 Thread Andres Freund
On 2017-06-23 13:05:21 -0400, Alvaro Herrera wrote:
> Tom Lane wrote:
> > Peter Eisentraut  writes:
> > > Any thoughts about keeping datumAsEqual() as a first check?  I did some
> > > light performance tests, but it was inconclusive.
> > 
> > Seems like it would tend to be a win if, in fact, the values are
> > usually equal.  If they're usually not, then it's a loser.  Do
> > we have any feeling for which case is more common?

Seems like a premature optimization to me - if you care about
performance and do this frequently, you're not going to end up using
FULL.  If we want to performance optimize, it'd probably better to
lookup candidate keys and use those if available.


> Though, thinking about it, maybe the datumIsEqual test would give the
> wrong answer for floating point values, and there'd be no fallback to
> equality with the logic I propose.  But then maybe that's all
> right ---

I don't think it'd be ok, we shouldn't just do the wrong thing because
we think it's unlikely to happen.


> who in their right minds would use floating point columns as part of
> replica identity ...?

Since this is FULL, it'll be all columns...


- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REPLICA IDENTITY FULL

2017-06-23 Thread Alvaro Herrera
Tom Lane wrote:
> Peter Eisentraut  writes:
> > Any thoughts about keeping datumAsEqual() as a first check?  I did some
> > light performance tests, but it was inconclusive.
> 
> Seems like it would tend to be a win if, in fact, the values are
> usually equal.  If they're usually not, then it's a loser.  Do
> we have any feeling for which case is more common?

What about keeping the datumIsEqual test for fixed length pass-by-value
types (I'm mostly thinking about fixed-width integers here ...) and
always use the full blown equality comparison for anything more
elaborate than that?

Though, thinking about it, maybe the datumIsEqual test would give the
wrong answer for floating point values, and there'd be no fallback to
equality with the logic I propose.  But then maybe that's all right ---
who in their right minds would use floating point columns as part of
replica identity ...?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REPLICA IDENTITY FULL

2017-06-23 Thread Tom Lane
Peter Eisentraut  writes:
> Any thoughts about keeping datumAsEqual() as a first check?  I did some
> light performance tests, but it was inconclusive.

Seems like it would tend to be a win if, in fact, the values are
usually equal.  If they're usually not, then it's a loser.  Do
we have any feeling for which case is more common?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REPLICA IDENTITY FULL

2017-06-23 Thread Peter Eisentraut
On 6/20/17 00:10, Andres Freund wrote:
> On 2017-06-20 11:52:10 +0900, Tatsuo Ishii wrote:
>> If my understanding is correct, it would not be easy to fix, no?
>>
>>> We might be able to refine that, but there is a general problem that
>>> without an index and an operator class, we are just doing our random
>>> best to match the values.
> 
> I don't see the problem as being big.  We should just look up the
> default btree opclass and use the relevant operator.  That's a how a
> number of things already work.

Patch for that.

Any thoughts about keeping datumAsEqual() as a first check?  I did some
light performance tests, but it was inconclusive.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 00ea950753c070d440818c71d06e2bbc53b6c294 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 23 Jun 2017 08:55:13 -0400
Subject: [PATCH] Fix replication with replica identity full

The comparison with the target rows on the subscriber side was done with
datumIsEqual(), which can have false negatives.  For instance, it didn't
work reliably for text columns.  So use the equality operator provided
by the type cache as fallback.

Also add more user documentation about replica identity requirements.

Reported-by: Tatsuo Ishii 
---
 doc/src/sgml/logical-replication.sgml  | 28 +++-
 src/backend/executor/execReplication.c | 22 +-
 src/test/subscription/t/001_rep_changes.pl | 23 ---
 3 files changed, 64 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/logical-replication.sgml 
b/doc/src/sgml/logical-replication.sgml
index 92ec175af1..fa8ae536d9 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -110,11 +110,29 @@ Publication
Publications can choose to limit the changes they produce to
any combination of INSERT, UPDATE, and
DELETE, similar to how triggers are fired by
-   particular event types.  If a table without a REPLICA
-   IDENTITY is added to a publication that
-   replicates UPDATE or DELETE
-   operations then subsequent UPDATE
-   or DELETE operations will fail on the publisher.
+   particular event types.  By default, all operation types are replicated.
+  
+
+  
+   A published table must have a replica identity configured in
+   order to be able to replicate UPDATE
+   and DELETE operations, so that appropriate rows to
+   update or delete can be identified on the subscriber side.  By default,
+   this is the primary key, if there is one.  Another unique index (with
+   certain additional requirements) can also be set to be the replica
+   identity.  If the table does not have any suitable key, then it can be set
+   to replica identity full, which means the entire row becomes
+   the key.  This, however, is very inefficient and should only be used as a
+   fallback if no other solution is possible.  If a replica identity other
+   than full is set on the publisher side, a replica identity
+   comprising the same or fewer columns must also be set on the subscriber
+   side.  See  for details on
+   how to set the replica identity.  If a table without a replica identity is
+   added to a publication that replicates UPDATE
+   or DELETE operations then
+   subsequent UPDATE or DELETE
+   operations will cause an error on the publisher.  INSERT
+   operations can proceed regardless of any replica identity.
   
 
   
diff --git a/src/backend/executor/execReplication.c 
b/src/backend/executor/execReplication.c
index 6dae79a8f0..97da47a1e2 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -24,12 +24,14 @@
 #include "parser/parsetree.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
+#include "utils/builtins.h"
 #include "utils/datum.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 #include "utils/snapmgr.h"
 #include "utils/syscache.h"
+#include "utils/typcache.h"
 #include "utils/tqual.h"
 
 
@@ -245,9 +247,27 @@ tuple_equals_slot(TupleDesc desc, HeapTuple tup, 
TupleTableSlot *slot)
continue;
 
att = desc->attrs[attrnum];
+
+   /*
+* To compare for equality, first try datumIsEqual().  If that 
returns
+* false, try the equality operator.
+*/
if (!datumIsEqual(values[attrnum], slot->tts_values[attrnum],
  att->attbyval, att->attlen))
-   return false;
+   {
+   TypeCacheEntry *typentry;
+
+   typentry = lookup_type_cache(att->atttypid, 
TYPECACHE_EQ_OPR_FINFO);
+   if (!OidIsValid(typentry->eq_opr_finfo.fn_oid))
+   ereport(ERROR,
+   

Re: [HACKERS] REPLICA IDENTITY FULL

2017-06-19 Thread Andres Freund
On 2017-06-20 11:52:10 +0900, Tatsuo Ishii wrote:
> If my understanding is correct, it would not be easy to fix, no?
> 
> > We might be able to refine that, but there is a general problem that
> > without an index and an operator class, we are just doing our random
> > best to match the values.

I don't see the problem as being big.  We should just look up the
default btree opclass and use the relevant operator.  That's a how a
number of things already work.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REPLICA IDENTITY FULL

2017-06-19 Thread Tatsuo Ishii
>> For example, if the table consists of only INT types, REPLICA IDENTITY
>> FULL works with UPDATE/DELETE (i.e. replicated), but if some of them
>> are TEXT types, then UPDATE/DELETE does not work.
>> 
>> See up thread for more details.
> 
> Right, but that's just a bug, nothing else.

If my understanding is correct, it would not be easy to fix, no?

> We might be able to refine that, but there is a general problem that
> without an index and an operator class, we are just doing our random
> best to match the values.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REPLICA IDENTITY FULL

2017-06-19 Thread Andres Freund
On 2017-06-20 11:46:13 +0900, Tatsuo Ishii wrote:
> >> Yes, that's my understanding too. However, the feature may or may not
> >> work depending on the data types of columns, probably I will not
> >> recommend users/my customers to use it.
> > 
> > I'm not sure how datatypes are playing into this?
> 
> For example, if the table consists of only INT types, REPLICA IDENTITY
> FULL works with UPDATE/DELETE (i.e. replicated), but if some of them
> are TEXT types, then UPDATE/DELETE does not work.
> 
> See up thread for more details.

Right, but that's just a bug, nothing else.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REPLICA IDENTITY FULL

2017-06-19 Thread Tatsuo Ishii
>> Yes, that's my understanding too. However, the feature may or may not
>> work depending on the data types of columns, probably I will not
>> recommend users/my customers to use it.
> 
> I'm not sure how datatypes are playing into this?

For example, if the table consists of only INT types, REPLICA IDENTITY
FULL works with UPDATE/DELETE (i.e. replicated), but if some of them
are TEXT types, then UPDATE/DELETE does not work.

See up thread for more details.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REPLICA IDENTITY FULL

2017-06-19 Thread Andres Freund
On 2017-06-20 11:29:06 +0900, Tatsuo Ishii wrote:
> > Nothing, and that's what I do. I just think it's a bit fuzzy. Maybe
> > I'm misunderstanding the purpose of REPLICA IDENTITY, but I read it as
> > saying "this is the replication key for this relation".
> 
> Yes, that's my understanding too. However, the feature may or may not
> work depending on the data types of columns, probably I will not
> recommend users/my customers to use it.

I'm not sure how datatypes are playing into this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REPLICA IDENTITY FULL

2017-06-19 Thread Tatsuo Ishii
> Nothing, and that's what I do. I just think it's a bit fuzzy. Maybe
> I'm misunderstanding the purpose of REPLICA IDENTITY, but I read it as
> saying "this is the replication key for this relation".

Yes, that's my understanding too. However, the feature may or may not
work depending on the data types of columns, probably I will not
recommend users/my customers to use it.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REPLICA IDENTITY FULL

2017-06-19 Thread Craig Ringer
On 20 June 2017 at 09:47, Andres Freund  wrote:
> On 2017-06-20 09:45:27 +0800, Craig Ringer wrote:
>> I frequently want to be able to use REPLICA IDENTITY DEFAULT, but also
>> record the whole old tuple not just keys, so they can be used in
>> conflict processing etc.
>
> What stops you from automatically using a candidate key if available?

Nothing, and that's what I do. I just think it's a bit fuzzy. Maybe
I'm misunderstanding the purpose of REPLICA IDENTITY, but I read it as
saying "this is the replication key for this relation". If you use
REPLICA IDENTITY FULL then the replication tool goes "nah, I think
I'll pick the PK instead" is that really right?

It's not a major issue.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REPLICA IDENTITY FULL

2017-06-19 Thread Craig Ringer
On 20 June 2017 at 06:53, Peter Eisentraut
 wrote:
> On 6/18/17 23:11, Tatsuo Ishii wrote:
>> While playing around with logical replication, I am confused by the
>> behavior of REPLICA IDENTITY FULL.
>
>> However, if a table has text columns, UPDATE/DELETE replication does
>> not work any more. Am I missing something?
>
> This is apparently because for replica identity full the comparison of
> the search key against the tuple value goes through datumIsEqual(),
> which doesn't work for TOAST values.

Personally I think REPLICA IDENTITY FULL conflates two related things.

One is "record the whole old value of the tuple in xlog so logical
decoding can access it".

Quite separately, there is "treat the full tuple as the replica identity key".

I frequently want to be able to use REPLICA IDENTITY DEFAULT, but also
record the whole old tuple not just keys, so they can be used in
conflict processing etc.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REPLICA IDENTITY FULL

2017-06-19 Thread Andres Freund
On 2017-06-20 09:45:27 +0800, Craig Ringer wrote:
> I frequently want to be able to use REPLICA IDENTITY DEFAULT, but also
> record the whole old tuple not just keys, so they can be used in
> conflict processing etc.

What stops you from automatically using a candidate key if available?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REPLICA IDENTITY FULL

2017-06-19 Thread Tatsuo Ishii
> This is apparently because for replica identity full the comparison of
> the search key against the tuple value goes through datumIsEqual(),
> which doesn't work for TOAST values.
> 
> We might be able to refine that, but there is a general problem that
> without an index and an operator class, we are just doing our random
> best to match the values.

In other word, pass-by-value types work in this case? If so, we can
document it or throw an error while executing ALTER REPLICA IDENTITY
FULL on tables consisting of non pass-by-values column types to
mitigate the problem.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REPLICA IDENTITY FULL

2017-06-19 Thread Peter Eisentraut
On 6/18/17 23:11, Tatsuo Ishii wrote:
> While playing around with logical replication, I am confused by the
> behavior of REPLICA IDENTITY FULL.

> However, if a table has text columns, UPDATE/DELETE replication does
> not work any more. Am I missing something?

This is apparently because for replica identity full the comparison of
the search key against the tuple value goes through datumIsEqual(),
which doesn't work for TOAST values.

We might be able to refine that, but there is a general problem that
without an index and an operator class, we are just doing our random
best to match the values.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers