Re: [HACKERS] replication identifier format

2014-06-23 Thread Robert Haas
On Wed, Jun 18, 2014 at 12:46 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-06-18 12:36:13 -0400, Robert Haas wrote:
  I actually don't think any of the discussions I was involved in had the
  externally visible version of replication identifiers limited to 16bits?
  If you are referring to my patch, 16bits was just the width of the
  *internal* name that should basically never be looked at. User visible
  replication identifiers are always identified by an arbitrary string -
  whose format is determined by the user of the replication identifier
  facility. *BDR* currently stores the system identifer, the database id
  and a name in there - but that's nothing core needs to concern itself
  with.

 I don't think you're going to be able to avoid users needing to know
 about those IDs.  The configuration table is going to have to be the
 same on all nodes, and how are you going to get that set up without
 those IDs being user-visible?

 Why? Users and other systems only ever see the external ID. Everything
 leaving the system is converted to the external form. The short id
 basically is only used in shared memory and in wal records. For both
 using longer strings would be problematic.

 In the patch I have the user can actually see them as they're stored in
 pg_replication_identifier, but there should never be a need for that.

Hmm, so there's no requirement that the short IDs are consistent
across different clusters that are replication to each other?  If
that's the case, that might address my concern, but I'd probably want
to go back through the latest patch and think about it a bit more.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] replication identifier format

2014-06-23 Thread Andres Freund
On 2014-06-23 10:09:49 -0400, Robert Haas wrote:
 On Wed, Jun 18, 2014 at 12:46 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
  On 2014-06-18 12:36:13 -0400, Robert Haas wrote:
   I actually don't think any of the discussions I was involved in had the
   externally visible version of replication identifiers limited to 16bits?
   If you are referring to my patch, 16bits was just the width of the
   *internal* name that should basically never be looked at. User visible
   replication identifiers are always identified by an arbitrary string -
   whose format is determined by the user of the replication identifier
   facility. *BDR* currently stores the system identifer, the database id
   and a name in there - but that's nothing core needs to concern itself
   with.
 
  I don't think you're going to be able to avoid users needing to know
  about those IDs.  The configuration table is going to have to be the
  same on all nodes, and how are you going to get that set up without
  those IDs being user-visible?
 
  Why? Users and other systems only ever see the external ID. Everything
  leaving the system is converted to the external form. The short id
  basically is only used in shared memory and in wal records. For both
  using longer strings would be problematic.
 
  In the patch I have the user can actually see them as they're stored in
  pg_replication_identifier, but there should never be a need for that.
 
 Hmm, so there's no requirement that the short IDs are consistent
 across different clusters that are replication to each other?

Nope. That seemed to be a hard requirement in the earlier discussions we
had (~2 years ago).

  If
 that's the case, that might address my concern, but I'd probably want
 to go back through the latest patch and think about it a bit more.

I'll send out a new version after I'm finished with the newest atomic
ops patch.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] replication identifier format

2014-06-23 Thread Robert Haas
On Mon, Jun 23, 2014 at 10:11 AM, Andres Freund and...@2ndquadrant.com wrote:
  Why? Users and other systems only ever see the external ID. Everything
  leaving the system is converted to the external form. The short id
  basically is only used in shared memory and in wal records. For both
  using longer strings would be problematic.
 
  In the patch I have the user can actually see them as they're stored in
  pg_replication_identifier, but there should never be a need for that.

 Hmm, so there's no requirement that the short IDs are consistent
 across different clusters that are replication to each other?

 Nope. That seemed to be a hard requirement in the earlier discussions we
 had (~2 years ago).

Oh, great.  Somehow I missed the fact that that had been addressed.  I
had assumed that we still needed global identifiers in which case I
think they'd need to be 64+ bits (preferably more like 128).  If they
only need to be locally significant that makes things much better.

Is there any real reason to add a pg_replication_identifier table, or
should we just let individual replication solutions manage the
identifiers within their own configuration tables?  I guess one
question is: What happens if there are multiple replication solutions
in use on a single server?  How do they coordinate?

  If
 that's the case, that might address my concern, but I'd probably want
 to go back through the latest patch and think about it a bit more.

 I'll send out a new version after I'm finished with the newest atomic
 ops patch.

Sweet.  I'm a little backed up right now, but will look at it when able.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] replication identifier format

2014-06-23 Thread Andres Freund
On 2014-06-23 10:45:51 -0400, Robert Haas wrote:
 On Mon, Jun 23, 2014 at 10:11 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
   Why? Users and other systems only ever see the external ID. Everything
   leaving the system is converted to the external form. The short id
   basically is only used in shared memory and in wal records. For both
   using longer strings would be problematic.
  
   In the patch I have the user can actually see them as they're stored in
   pg_replication_identifier, but there should never be a need for that.
 
  Hmm, so there's no requirement that the short IDs are consistent
  across different clusters that are replication to each other?
 
  Nope. That seemed to be a hard requirement in the earlier discussions we
  had (~2 years ago).
 
 Oh, great.  Somehow I missed the fact that that had been addressed.  I
 had assumed that we still needed global identifiers in which case I
 think they'd need to be 64+ bits (preferably more like 128).  If they
 only need to be locally significant that makes things much better.

Well, I was just talking about the 'short ids' here and how they are
used in crash recovery/shmem et al. Those indeed don't need to be
coordinated.
If you ever use logical decoding on a system that receives changes from
other systems (cascading replication, multimaster) you'll likely want to
add the *long* form of that identifier to the output in the output
plugin so the downstream nodes can identify the source. How one
specific replication solution deals with coordinating this between
systems is essentially that suite's problem.

The external identifier currently is a 'text' column, so essentially
unlimited. (Well, I just noticed that the table currently doesn't have a
toast table assigned, so it's only a couple kb right now, but ...)

 Is there any real reason to add a pg_replication_identifier table, or
 should we just let individual replication solutions manage the
 identifiers within their own configuration tables?

I don't think that'd work. During crash recovery the short/internal IDs
are read from WAL records and need to be unique across *all*
databases. Since there's no way for different replication solutions or
even the same to coordinate this across databases (as there's no way to
add shared relations) it has to be builtin.

It's also useful so we can have stuff like the
'pg_replication_identifier_progress' view which tells you internal_id,
external_id, remote_lsn, local_lsn. Just showing the internal ID would
imo be bad.

 I guess one
 question is: What happens if there are multiple replication solutions
 in use on a single server?  How do they coordinate?

What's your concern here? You're wondering how they can make sure the
identifiers they create are non-overlapping?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] replication identifier format

2014-06-23 Thread Robert Haas
On Mon, Jun 23, 2014 at 11:28 AM, Andres Freund and...@2ndquadrant.com wrote:
 Oh, great.  Somehow I missed the fact that that had been addressed.  I
 had assumed that we still needed global identifiers in which case I
 think they'd need to be 64+ bits (preferably more like 128).  If they
 only need to be locally significant that makes things much better.

 Well, I was just talking about the 'short ids' here and how they are
 used in crash recovery/shmem et al. Those indeed don't need to be
 coordinated.
 If you ever use logical decoding on a system that receives changes from
 other systems (cascading replication, multimaster) you'll likely want to
 add the *long* form of that identifier to the output in the output
 plugin so the downstream nodes can identify the source. How one
 specific replication solution deals with coordinating this between
 systems is essentially that suite's problem.

OK.

 The external identifier currently is a 'text' column, so essentially
 unlimited. (Well, I just noticed that the table currently doesn't have a
 toast table assigned, so it's only a couple kb right now, but ...)

OK.  I have no clear reason to dislike that.

 Is there any real reason to add a pg_replication_identifier table, or
 should we just let individual replication solutions manage the
 identifiers within their own configuration tables?

 I don't think that'd work. During crash recovery the short/internal IDs
 are read from WAL records and need to be unique across *all*
 databases. Since there's no way for different replication solutions or
 even the same to coordinate this across databases (as there's no way to
 add shared relations) it has to be builtin.

That makes sense.

 It's also useful so we can have stuff like the
 'pg_replication_identifier_progress' view which tells you internal_id,
 external_id, remote_lsn, local_lsn. Just showing the internal ID would
 imo be bad.

OK.

 I guess one
 question is: What happens if there are multiple replication solutions
 in use on a single server?  How do they coordinate?

 What's your concern here? You're wondering how they can make sure the
 identifiers they create are non-overlapping?

Yeah, I was just thinking that might be why you installed a catalog
table for this, but now I see that there are several other reasons
also.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] replication identifier format

2014-06-18 Thread Andres Freund
On 2014-06-18 12:36:13 -0400, Robert Haas wrote:
  I actually don't think any of the discussions I was involved in had the
  externally visible version of replication identifiers limited to 16bits?
  If you are referring to my patch, 16bits was just the width of the
  *internal* name that should basically never be looked at. User visible
  replication identifiers are always identified by an arbitrary string -
  whose format is determined by the user of the replication identifier
  facility. *BDR* currently stores the system identifer, the database id
  and a name in there - but that's nothing core needs to concern itself
  with.
 
 I don't think you're going to be able to avoid users needing to know
 about those IDs.  The configuration table is going to have to be the
 same on all nodes, and how are you going to get that set up without
 those IDs being user-visible?

Why? Users and other systems only ever see the external ID. Everything
leaving the system is converted to the external form. The short id
basically is only used in shared memory and in wal records. For both
using longer strings would be problematic.

In the patch I have the user can actually see them as they're stored in
pg_replication_identifier, but there should never be a need for that.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers