Re: [HACKERS] logical changeset generation v6.4
On Fri, Oct 25, 2013 at 10:58 AM, Andres Freund wrote: > So, I am currently wondering about how to store the "old" tuple, based > on this. Currently it is stored using the TupleDesc of the index the old > tuple is based on. But if we want to allow transporting the entire tuple > that obviously cannot be the only option. > One option would be to change the stored format based on what's > configured, using the relation's TupleDesc if FULL is used. But I think > always using the heap relation's desc is better. I heartily agree. > The not-logged columns would then just be represented as NULLs. That > will make old primary keys bigger if the relation has a high number of > columns and the key small, but I don't think it matters enough. Even if it does matter, the cure seems likely to be worse than the disease. My only other comment is that if NONE is selected, we ought to omit the old tuple altogether, not store one that is all-nulls. But I bet you had that in mind anyway. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.4
Hi, On 2013-10-22 16:07:16 +0200, Andres Freund wrote: > On 2013-10-21 20:16:29 +0200, Andres Freund wrote: > > Current draft is: > > ALTER TABLE ... REPLICA IDENTITY NOTHING|FULL|DEFAULT > > ALTER TABLE ... REPLICA IDENTITY USING INDEX ...; > > > > which leaves the door open for > > > > ALTER TABLE ... REPLICA IDENTITY USING '(' column_name_list ')'; > > > > Does anybody have a strong feeling about requiring support for CREATE > > TABLE for this? > > Attached is a patch ontop of master implementing this syntax. It's not > wired up to the changeset extraction patch yet as I am not sure whether > others agree about the storage. So, I am currently wondering about how to store the "old" tuple, based on this. Currently it is stored using the TupleDesc of the index the old tuple is based on. But if we want to allow transporting the entire tuple that obviously cannot be the only option. One option would be to change the stored format based on what's configured, using the relation's TupleDesc if FULL is used. But I think always using the heap relation's desc is better. The not-logged columns would then just be represented as NULLs. That will make old primary keys bigger if the relation has a high number of columns and the key small, but I don't think it matters enough. Opinions? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.4
On Tue, Oct 22, 2013 at 10:07 AM, Andres Freund wrote: > On 2013-10-21 20:16:29 +0200, Andres Freund wrote: >> On 2013-10-18 20:50:58 +0200, Andres Freund wrote: >> > How about modifying the selection to go from: >> > * all rows if ALTER TABLE ... REPLICA IDENTITY NOTHING|FULL; >> > * index chosen by ALTER TABLE ... REPLICA IDENTITY USING indexname >> > * [later, maybe] ALTER TABLE ... REPLICA IDENTITY (cola, colb) >> >> Current draft is: >> ALTER TABLE ... REPLICA IDENTITY NOTHING|FULL|DEFAULT >> ALTER TABLE ... REPLICA IDENTITY USING INDEX ...; >> >> which leaves the door open for >> >> ALTER TABLE ... REPLICA IDENTITY USING '(' column_name_list ')'; >> >> Does anybody have a strong feeling about requiring support for CREATE >> TABLE for this? > > Attached is a patch ontop of master implementing this syntax. It's not > wired up to the changeset extraction patch yet as I am not sure whether > others agree about the storage. > > pg_class grew a 'relreplident' char, storing: > * 'd' default > * 'n' nothing > * 'f' full > * 'i' index with indisreplident set, or default if index has been > dropped > pg_index grew a 'indisreplident' bool indicating it is set as the > replica identity for a replident = 'i' relation. > > Both changes shouldn't change the width of the affected relations, they > should reuse existing padding. > > Does somebody prefer a different storage? I had imagined that the storage might consist simply of a pg_attribute boolean. So full would turn them all on, null would turn them all of, etc. But that does make it hard to implement the "whatever the pkey happens to be right now" behavior, so maybe your idea is better. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.4
On 2013-10-18 20:50:58 +0200, Andres Freund wrote: > How about modifying the selection to go from: > * all rows if ALTER TABLE ... REPLICA IDENTITY NOTHING|FULL; > * index chosen by ALTER TABLE ... REPLICA IDENTITY USING indexname > * [later, maybe] ALTER TABLE ... REPLICA IDENTITY (cola, colb) Current draft is: ALTER TABLE ... REPLICA IDENTITY NOTHING|FULL|DEFAULT ALTER TABLE ... REPLICA IDENTITY USING INDEX ...; which leaves the door open for ALTER TABLE ... REPLICA IDENTITY USING '(' column_name_list ')'; Does anybody have a strong feeling about requiring support for CREATE TABLE for this? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.4
On 10/21/2013 05:06 PM, Andres Freund wrote: > On 2013-10-21 16:40:43 +0200, Hannu Krosing wrote: >> On 10/18/2013 08:50 PM, Andres Freund wrote: >>> On 2013-10-18 08:11:29 -0400, Robert Haas wrote: >> ... 2. If that seems too complicated, how about just logging the whole old tuple for version 1? >>> I think that'd make the patch much less useful because it bloats WAL >>> unnecessarily for the primary user (replication) of it. I'd rather go >>> for primary keys only if that proves to be the contentious point. >>> >>> How about modifying the selection to go from: >>> * all rows if ALTER TABLE ... REPLICA IDENTITY NOTHING|FULL; >>> * index chosen by ALTER TABLE ... REPLICA IDENTITY USING indexname >>> * [later, maybe] ALTER TABLE ... REPLICA IDENTITY (cola, colb) >>> * primary key >>> * candidate key with the smallest oid >>> >>> Including the candidate key will help people using changeset extration >>> for auditing that do not have primary key. That really isn't an >>> infrequent usecase. >> As I understand it for a table with *no* unique index, >> the "candidate key" is the full tuple, so if we get an UPDATE for >> it then this should be replicated as >> "UPDATE first row matching (NOT DISTINCT FROM) all columns" >> which on replay side will be equivalent to >> CREATE CURSOR ...; FETCH 1 ...; UPDATE ... WHERE CURRENT...' > No, it's not a candidate key since it's not uniquely identifying a > row. You can play tricks as you describe, but that still doesn't make > the whole row a candidate key. > > But anyway, I suggest allowing for logging all columns above... I the "all columns" option this ? How about modifying the selection to go from: * all rows if ALTER TABLE ... REPLICA IDENTITY NOTHING|FULL; for some reason I thought it to be option to either log or not log PK column ... > >> I know that this will slow down replication, as you can not use direct >> index updates internally - at least not easily - but need to let postgreSQL >> actually plan this, but such single row update is no faster on origin >> either. > That's not actually true. Consider somebody doing something like: > UPDATE big_table_without_indexes SET column = ...; > On the source side that's essentialy O(n). If you replicate on a > row-by-row basis it will be O(n^2) on the replay side. Probably more like O(n^2 / 2) but yes, this is what I meant with the sentence after that ;) Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.4
On 2013-10-21 11:14:37 -0400, Robert Haas wrote: > On Mon, Oct 21, 2013 at 9:51 AM, Andres Freund wrote: > > I have a hard time to understand why you dislike it so much. Think of a > > big schema where you want to add auditing via changeset > > extraction. Because of problems with reindexing primary key you've just > > used candidate keys so far. Why should you go through each of a couple > > of hundred tables and explictly choose an index when you just want an > > identifier of changed rows? > > By nature of it being a candidate key it is *guranteed* to uniquely > > identify a row? And you can make the output plugin give you the used > > columns/the indexname without a problem. > > Sure, well, if a particular user wants to choose candidate keys > essentially at random from among the unique indexes present, there's > nothing to prevent them from writing a script to do that. But > assuming that one unique index is just as good as another is just > wrong. If you pick a "candidate key" that doesn't actually represent > the users' notion of row identity, then your audit log will be > thoroughly useless, even if it does uniquely identify the rows > involved. Why? If the columns are specified in the log, by definition the values will be sufficient to identify a row. Even if a "nicer" key might exist. Since I seemingly can't convince you, I'll modify things that way for now as it can easily be changed later, but I still don't see the problem. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.4
On Mon, Oct 21, 2013 at 9:51 AM, Andres Freund wrote: > I have a hard time to understand why you dislike it so much. Think of a > big schema where you want to add auditing via changeset > extraction. Because of problems with reindexing primary key you've just > used candidate keys so far. Why should you go through each of a couple > of hundred tables and explictly choose an index when you just want an > identifier of changed rows? > By nature of it being a candidate key it is *guranteed* to uniquely > identify a row? And you can make the output plugin give you the used > columns/the indexname without a problem. Sure, well, if a particular user wants to choose candidate keys essentially at random from among the unique indexes present, there's nothing to prevent them from writing a script to do that. But assuming that one unique index is just as good as another is just wrong. If you pick a "candidate key" that doesn't actually represent the users' notion of row identity, then your audit log will be thoroughly useless, even if it does uniquely identify the rows involved. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.4
On 2013-10-21 16:40:43 +0200, Hannu Krosing wrote: > On 10/18/2013 08:50 PM, Andres Freund wrote: > > On 2013-10-18 08:11:29 -0400, Robert Haas wrote: > ... > >> 2. If that seems too complicated, how about just logging the whole old > >> tuple for version 1? > > I think that'd make the patch much less useful because it bloats WAL > > unnecessarily for the primary user (replication) of it. I'd rather go > > for primary keys only if that proves to be the contentious point. > > > > How about modifying the selection to go from: > > * all rows if ALTER TABLE ... REPLICA IDENTITY NOTHING|FULL; > > * index chosen by ALTER TABLE ... REPLICA IDENTITY USING indexname > > * [later, maybe] ALTER TABLE ... REPLICA IDENTITY (cola, colb) > > * primary key > > * candidate key with the smallest oid > > > > Including the candidate key will help people using changeset extration > > for auditing that do not have primary key. That really isn't an > > infrequent usecase. > As I understand it for a table with *no* unique index, > the "candidate key" is the full tuple, so if we get an UPDATE for > it then this should be replicated as > "UPDATE first row matching (NOT DISTINCT FROM) all columns" > which on replay side will be equivalent to > CREATE CURSOR ...; FETCH 1 ...; UPDATE ... WHERE CURRENT...' No, it's not a candidate key since it's not uniquely identifying a row. You can play tricks as you describe, but that still doesn't make the whole row a candidate key. But anyway, I suggest allowing for logging all columns above... > I know that this will slow down replication, as you can not use direct > index updates internally - at least not easily - but need to let postgreSQL > actually plan this, but such single row update is no faster on origin > either. That's not actually true. Consider somebody doing something like: UPDATE big_table_without_indexes SET column = ...; On the source side that's essentialy O(n). If you replicate on a row-by-row basis it will be O(n^2) on the replay side. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.4
On 10/18/2013 08:50 PM, Andres Freund wrote: > On 2013-10-18 08:11:29 -0400, Robert Haas wrote: ... >> 2. If that seems too complicated, how about just logging the whole old >> tuple for version 1? > I think that'd make the patch much less useful because it bloats WAL > unnecessarily for the primary user (replication) of it. I'd rather go > for primary keys only if that proves to be the contentious point. > > How about modifying the selection to go from: > * all rows if ALTER TABLE ... REPLICA IDENTITY NOTHING|FULL; > * index chosen by ALTER TABLE ... REPLICA IDENTITY USING indexname > * [later, maybe] ALTER TABLE ... REPLICA IDENTITY (cola, colb) > * primary key > * candidate key with the smallest oid > > Including the candidate key will help people using changeset extration > for auditing that do not have primary key. That really isn't an > infrequent usecase. As I understand it for a table with *no* unique index, the "candidate key" is the full tuple, so if we get an UPDATE for it then this should be replicated as "UPDATE first row matching (NOT DISTINCT FROM) all columns" which on replay side will be equivalent to CREATE CURSOR ...; FETCH 1 ...; UPDATE ... WHERE CURRENT...' I know that this will slow down replication, as you can not use direct index updates internally - at least not easily - but need to let postgreSQL actually plan this, but such single row update is no faster on origin either. Of course when it is a full-table update on a table with no indexes, then doing the same one tuple at a time is really slow. -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.4
On 2013-10-21 09:40:13 -0400, Robert Haas wrote: > On Fri, Oct 18, 2013 at 2:50 PM, Andres Freund wrote: > > How about modifying the selection to go from: > > * all rows if ALTER TABLE ... REPLICA IDENTITY NOTHING|FULL; > > * index chosen by ALTER TABLE ... REPLICA IDENTITY USING indexname > > * [later, maybe] ALTER TABLE ... REPLICA IDENTITY (cola, colb) > > * primary key > > * candidate key with the smallest oid > > > > Including the candidate key will help people using changeset extration > > for auditing that do not have primary key. That really isn't an > > infrequent usecase. > > > > I've chosen REPLICA IDENTITY; NOTHIN; FULL; because those are all > > existing keywords, and afaics shouldn't generate any conflicts. On a > > green field we probably name them differently, but ... > > I'm really pretty much dead set against the "candidate key with the > smallest OID" proposal. I think that's just plain old bad idea. It's > just magical behavior which will result in users being surprised and > unhappy. I don't think there's really a problem with saying, hey, if > you configure changeset extraction and you don't configure a replica > identity, then you don't get any columns from the old tuple. I have a hard time to understand why you dislike it so much. Think of a big schema where you want to add auditing via changeset extraction. Because of problems with reindexing primary key you've just used candidate keys so far. Why should you go through each of a couple of hundred tables and explictly choose an index when you just want an identifier of changed rows? By nature of it being a candidate key it is *guranteed* to uniquely identify a row? And you can make the output plugin give you the used columns/the indexname without a problem. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.4
On Fri, Oct 18, 2013 at 2:50 PM, Andres Freund wrote: > How about modifying the selection to go from: > * all rows if ALTER TABLE ... REPLICA IDENTITY NOTHING|FULL; > * index chosen by ALTER TABLE ... REPLICA IDENTITY USING indexname > * [later, maybe] ALTER TABLE ... REPLICA IDENTITY (cola, colb) > * primary key > * candidate key with the smallest oid > > Including the candidate key will help people using changeset extration > for auditing that do not have primary key. That really isn't an > infrequent usecase. > > I've chosen REPLICA IDENTITY; NOTHIN; FULL; because those are all > existing keywords, and afaics shouldn't generate any conflicts. On a > green field we probably name them differently, but ... I'm really pretty much dead set against the "candidate key with the smallest OID" proposal. I think that's just plain old bad idea. It's just magical behavior which will result in users being surprised and unhappy. I don't think there's really a problem with saying, hey, if you configure changeset extraction and you don't configure a replica identity, then you don't get any columns from the old tuple. If you don't like that, change the configuration. It's always nice to spare users unnecessary configuration, of course, but trying to make things simpler than they really are tends to hurt more than it helps. On the naming, I find REPLICA IDENTITY to be pretty good. We've already places where we're using the REPLICA keyword to indicate places where we've got core support intended to dovetail with external replication solutions, and this seems to fit that paradigm nicely. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.4
On 2013-10-18 08:11:29 -0400, Robert Haas wrote: > On Mon, Oct 14, 2013 at 9:12 AM, Andres Freund wrote: > > Attached you can find version 6.4 of the patchset: > > So I'm still unhappy with the arbitrary logic in what's now patch 1 > for choosing the candidate key. On another thread, someone mentioned > that they might want the entire old tuple, and that got me thinking: > there's no particular reason why the user has to want exactly the > columns that exist in some unique, immediate, non-partial index (what > a name). So I have two proposals: > 1. Instead of allowing the user to choose the index to be used, or > picking it for them, how about if we let them choose the old-tuple > columns they want logged? This could be a per-column option. If the > primary key can be assumed known and unchanging, then the answer might > be that the user wants *no* old-tuple columns logged. Contrariwise > someone might want everything logged, or anything in the middle. I definitely can see the usecase for logging anything or nothing, arbitrary column select seems to be too complicated for now. > 2. If that seems too complicated, how about just logging the whole old > tuple for version 1? I think that'd make the patch much less useful because it bloats WAL unnecessarily for the primary user (replication) of it. I'd rather go for primary keys only if that proves to be the contentious point. How about modifying the selection to go from: * all rows if ALTER TABLE ... REPLICA IDENTITY NOTHING|FULL; * index chosen by ALTER TABLE ... REPLICA IDENTITY USING indexname * [later, maybe] ALTER TABLE ... REPLICA IDENTITY (cola, colb) * primary key * candidate key with the smallest oid Including the candidate key will help people using changeset extration for auditing that do not have primary key. That really isn't an infrequent usecase. I've chosen REPLICA IDENTITY; NOTHIN; FULL; because those are all existing keywords, and afaics shouldn't generate any conflicts. On a green field we probably name them differently, but ... Comments? Greetings, Andres Freund PS: candidate key implies a key which is: immediate (aka not deferred), unique, non-partial and only contains NOT NULL columns. -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.4
On Fri, Oct 18, 2013 at 7:11 AM, Robert Haas wrote: > On Mon, Oct 14, 2013 at 9:12 AM, Andres Freund wrote: >> Attached you can find version 6.4 of the patchset: > > So I'm still unhappy with the arbitrary logic in what's now patch 1 > for choosing the candidate key. On another thread, someone mentioned > that they might want the entire old tuple, and that got me thinking: > there's no particular reason why the user has to want exactly the > columns that exist in some unique, immediate, non-partial index (what > a name). So I have two proposals: Aside: what's an immediate index? Is this speaking to the constraint? (immediate vs deferred?) merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.4
On Mon, Oct 14, 2013 at 9:12 AM, Andres Freund wrote: > Attached you can find version 6.4 of the patchset: So I'm still unhappy with the arbitrary logic in what's now patch 1 for choosing the candidate key. On another thread, someone mentioned that they might want the entire old tuple, and that got me thinking: there's no particular reason why the user has to want exactly the columns that exist in some unique, immediate, non-partial index (what a name). So I have two proposals: 1. Instead of allowing the user to choose the index to be used, or picking it for them, how about if we let them choose the old-tuple columns they want logged? This could be a per-column option. If the primary key can be assumed known and unchanging, then the answer might be that the user wants *no* old-tuple columns logged. Contrariwise someone might want everything logged, or anything in the middle. 2. If that seems too complicated, how about just logging the whole old tuple for version 1? I'm basically fine with the rest of what's in the first two patches, but we need to sort out some kind of consensus on this issue. Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers