Re: Logical Replication Custom Column Expression
>> And yes, probably you need to change the way you reply to email on >> this list. Top-posting is generally avoided. See >> https://wiki.postgresql.org/wiki/Mailing_Lists. >Thanks for bringing this into the discussion :) Thinking these days more about this topic, subscriber name is not a bad idea, although it makes sense to be able to give your own value even on subscriber level, for example an integer. Having a custom integer value is better as definitely this integer will participate later in all joins beside the tables and for sure joining with an integer it would be quicker rather than joining on a character varying (plus the rest of the columns). In addition, discussing with other people and also on Stack Overflow/DBAExchange I have found that other people think it is a great enhancement for analytical purposes. Στις Τετ 30 Νοε 2022 στις 10:39 π.μ., ο/η Stavros Koureas < koureasstav...@gmail.com> έγραψε: > > > Στις Τρί 29 Νοε 2022 στις 3:27 μ.μ., ο/η Ashutosh Bapat < > ashutosh.bapat@gmail.com> έγραψε: > > That would be too restrictive - not necessarily in your application > > but generally. There could be some tables where consolidating rows > > with same PK from different publishers into a single row in subscriber > > would be desirable. I think we need to enable the property for every > > subscriber that intends to add publisher column to the desired and > > subscribed tables. But there should be another option per table which > > will indicate that receiver should add publisher when INSERTING row to > > that table. > > So we are discussing the scope level of this property, if this property > will be implemented on subscriber level or on subscriber table. > In that case I am not sure how this will be implemented as currently > postgres subscribers can have multiple tables streamed from a single > publisher. > In that case we may have an additional syntax on subscriber, for example: > > CREATE SUBSCRIPTION sub1 CONNECTION 'host=localhost port=5432 > user=postgres password=XX dbname=publisher1' PUBLICATION pub1 with > (enabled = false, create_slot = false, slot_name = NONE, tables = > {tableA:union, tableB:none, }); > > Something like this? > > > And yes, probably you need to change the way you reply to email on > > this list. Top-posting is generally avoided. See > > https://wiki.postgresql.org/wiki/Mailing_Lists. > > Thanks for bringing this into the discussion :) >
Re: Logical Replication Custom Column Expression
On Wed, Nov 30, 2022 at 2:09 PM Stavros Koureas wrote: > > > > Στις Τρί 29 Νοε 2022 στις 3:27 μ.μ., ο/η Ashutosh Bapat > έγραψε: > > That would be too restrictive - not necessarily in your application > > but generally. There could be some tables where consolidating rows > > with same PK from different publishers into a single row in subscriber > > would be desirable. I think we need to enable the property for every > > subscriber that intends to add publisher column to the desired and > > subscribed tables. But there should be another option per table which > > will indicate that receiver should add publisher when INSERTING row to > > that table. > > So we are discussing the scope level of this property, if this property will > be implemented on subscriber level or on subscriber table. > In that case I am not sure how this will be implemented as currently postgres > subscribers can have multiple tables streamed from a single publisher. > In that case we may have an additional syntax on subscriber, for example: > > CREATE SUBSCRIPTION sub1 CONNECTION 'host=localhost port=5432 user=postgres > password=XX dbname=publisher1' PUBLICATION pub1 with (enabled = false, > create_slot = false, slot_name = NONE, tables = {tableA:union, tableB:none, > }); > > Something like this? Nope, I think we will need to add a table level property through table options or receiver can infer it by looking at the table columns - e.g. existence of origin_id column or some such thing. -- Best Wishes, Ashutosh Bapat
Re: Logical Replication Custom Column Expression
Στις Τρί 29 Νοε 2022 στις 3:27 μ.μ., ο/η Ashutosh Bapat < ashutosh.bapat@gmail.com> έγραψε: > That would be too restrictive - not necessarily in your application > but generally. There could be some tables where consolidating rows > with same PK from different publishers into a single row in subscriber > would be desirable. I think we need to enable the property for every > subscriber that intends to add publisher column to the desired and > subscribed tables. But there should be another option per table which > will indicate that receiver should add publisher when INSERTING row to > that table. So we are discussing the scope level of this property, if this property will be implemented on subscriber level or on subscriber table. In that case I am not sure how this will be implemented as currently postgres subscribers can have multiple tables streamed from a single publisher. In that case we may have an additional syntax on subscriber, for example: CREATE SUBSCRIPTION sub1 CONNECTION 'host=localhost port=5432 user=postgres password=XX dbname=publisher1' PUBLICATION pub1 with (enabled = false, create_slot = false, slot_name = NONE, tables = {tableA:union, tableB:none, }); Something like this? > And yes, probably you need to change the way you reply to email on > this list. Top-posting is generally avoided. See > https://wiki.postgresql.org/wiki/Mailing_Lists. Thanks for bringing this into the discussion :)
Re: Logical Replication Custom Column Expression
On Fri, Nov 25, 2022 at 4:13 PM Stavros Koureas wrote: > > Yes, if the property is on the subscription side then it should be applied > for all the tables that the connected publication is exposing. > So if the property is enabled you should be sure that this origin column > exists to all of the tables that the publication is exposing... > That would be too restrictive - not necessarily in your application but generally. There could be some tables where consolidating rows with same PK from different publishers into a single row in subscriber would be desirable. I think we need to enable the property for every subscriber that intends to add publisher column to the desired and subscribed tables. But there should be another option per table which will indicate that receiver should add publisher when INSERTING row to that table. > Sure this is the complete idea, that the subscriber should match the PK of > origin, > As the subscription table will contain same key values from different > origins, for example: > And yes, probably you need to change the way you reply to email on this list. Top-posting is generally avoided. See https://wiki.postgresql.org/wiki/Mailing_Lists. -- Best Wishes, Ashutosh Bapat
Re: Logical Replication Custom Column Expression
Sure I understand and neither do I have good knowledge of what else could be influenced by such a change. If the value of the column is the subscriber name has no benefit to this idea of merging multiple upstreams with same primary keys, later you describe the "connection dbname", yes this could be a possibility. I do not fully understand that part "how will the initial tablesync COPY efficiently assign these subscriber name column values?" Why is difficult that during the initial sync put everywhere the same value for all rows of the same origin? Στις Δευ 28 Νοε 2022 στις 10:16 π.μ., ο/η Peter Smith έγραψε: > On Fri, Nov 25, 2022 at 9:43 PM Stavros Koureas > wrote: > > > > Yes, if the property is on the subscription side then it should be > applied for all the tables that the connected publication is exposing. > > So if the property is enabled you should be sure that this origin column > exists to all of the tables that the publication is exposing... > > > > Sure this is the complete idea, that the subscriber should match the PK > of origin, > > As the subscription table will contain same key values from different > origins, for example: > > > > For publisher1 database table > > id pk integer | value character varying > > 1 | testA1 > > 2 | testA2 > > > > For publisher2 database table > > id pk integer | value character varying > > 1 | testB1 > > 2 | testB2 > > > > For subscriber database table > > origin pk character varying | id pk integer | value character varying > > publisher1 | 1 | testA1 > > publisher1 | 2 | testA2 > > publisher2 | 1 | testB1 > > publisher2 | 2 | testB2 > > > > All statements INSERT, UPDATE, DELETE should always include the > predicate of the origin. > > > > This sounds similar to what I had posted [1] although I was saying the > generated column value might be the *subscriber* name, not the origin > publisher name. (where are you getting that value from -- somehow from > the subscriptions' CONNECTION dbname?) > > Anyway, regardless of the details, please note -- my idea was really > intended just as a discussion starting point to demonstrate that > required functionality might be achieved using a simpler syntax than > what had been previously suggested. But in practice there may be some > problems with this approach -- e.g. how will the initial tablesync > COPY efficiently assign these subscriber name column values? > > -- > [1] > https://www.postgresql.org/message-id/CAHut%2BPuZowXd7Aa7t0nqjP6afHMwJarngzeMq%2BQP0vE2KKLOgQ%40mail.gmail.com > > Kind Regards, > Peter Smith. > Fujitsu Australia. >
Re: Logical Replication Custom Column Expression
On Fri, Nov 25, 2022 at 9:43 PM Stavros Koureas wrote: > > Yes, if the property is on the subscription side then it should be applied > for all the tables that the connected publication is exposing. > So if the property is enabled you should be sure that this origin column > exists to all of the tables that the publication is exposing... > > Sure this is the complete idea, that the subscriber should match the PK of > origin, > As the subscription table will contain same key values from different > origins, for example: > > For publisher1 database table > id pk integer | value character varying > 1 | testA1 > 2 | testA2 > > For publisher2 database table > id pk integer | value character varying > 1 | testB1 > 2 | testB2 > > For subscriber database table > origin pk character varying | id pk integer | value character varying > publisher1 | 1 | testA1 > publisher1 | 2 | testA2 > publisher2 | 1 | testB1 > publisher2 | 2 | testB2 > > All statements INSERT, UPDATE, DELETE should always include the predicate of > the origin. > This sounds similar to what I had posted [1] although I was saying the generated column value might be the *subscriber* name, not the origin publisher name. (where are you getting that value from -- somehow from the subscriptions' CONNECTION dbname?) Anyway, regardless of the details, please note -- my idea was really intended just as a discussion starting point to demonstrate that required functionality might be achieved using a simpler syntax than what had been previously suggested. But in practice there may be some problems with this approach -- e.g. how will the initial tablesync COPY efficiently assign these subscriber name column values? -- [1] https://www.postgresql.org/message-id/CAHut%2BPuZowXd7Aa7t0nqjP6afHMwJarngzeMq%2BQP0vE2KKLOgQ%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia.
Re: Logical Replication Custom Column Expression
Yes, if the property is on the subscription side then it should be applied for all the tables that the connected publication is exposing. So if the property is enabled you should be sure that this origin column exists to all of the tables that the publication is exposing... Sure this is the complete idea, that the subscriber should match the PK of origin, As the subscription table will contain same key values from different origins, for example: *For publisher1 database **table* id pk integer | value character varying 1 | testA1 2 | testA2 *For publisher2 database **table* id pk integer | value character varying 1 | testB1 2 | testB2 *For subscriber database table* origin *pk *character varying | id *pk *integer | value character varying publisher1 | 1 | testA1 publisher1 | 2 | testA2 publisher2 | 1 | testB1 publisher2 | 2 | testB2 All statements INSERT, UPDATE, DELETE should always include the predicate of the origin. Στις Παρ 25 Νοε 2022 στις 12:21 μ.μ., ο/η Ashutosh Bapat < ashutosh.bapat@gmail.com> έγραψε: > On Wed, Nov 23, 2022 at 4:54 AM Peter Smith wrote: > > > > On Wed, Nov 23, 2022 at 7:38 AM Stavros Koureas > > wrote: > > > > > > Reading more carefully what you described, I think you are interested > in getting something you call origin from publishers, probably some > metadata from the publications. > > > > > > This identifier in those metadata maybe does not have business value > on the reporting side. The idea is to use a value which has specific > meaning to the user at the end. > > > > > > For example assigning 1 for tenant 1, 2 for tenant 2 and so one, at > the end based on a dimension table which holds this mapping the user would > be able to filter the data. So programmatically the user can set the id > value of the column plus creating the mapping table from an application > let’s say and be able to distinguish the data. > > > > > > In addition this column should have the ability to be part of the > primary key on the subscription table in order to not conflict with lines > from other tenants having the same keys. > > > > > > > > > > I was wondering if a simpler syntax solution might also work here. > > > > Imagine another SUBSCRIPTION parameter that indicates to write the > > *name* of the subscription to some pre-defined table column: > > e.g. CREATE SUBSCRIPTION subname FOR PUBLICATION pub_tenant_1 > > CONNECTION '...' WITH (subscription_column); > > > > Logical Replication already allows the subscriber table to have extra > > columns, so you just need to manually create the extra 'subscription' > > column up-front. > > > > Then... > > > > ~~ > > > > On Publisher: > > > > test_pub=# CREATE TABLE tab(id int primary key, description varchar); > > CREATE TABLE > > > > test_pub=# INSERT INTO tab VALUES (1,'one'),(2,'two'),(3,'three'); > > INSERT 0 3 > > > > test_pub=# CREATE PUBLICATION tenant1 FOR ALL TABLES; > > CREATE PUBLICATION > > > > ~~ > > > > On Subscriber: > > > > test_sub=# CREATE TABLE tab(id int, description varchar, subscription > varchar); > > CREATE TABLE > > > > test_sub=# CREATE SUBSCRIPTION sub_tenant1 CONNECTION 'host=localhost > > dbname=test_pub' PUBLICATION tenant1 WITH (subscription_column); > > CREATE SUBSCRIPTION > > > > test_sub=# SELECT * FROM tab; > > id | description | subscription > > +-+-- > > 1 | one | sub_tenant1 > > 2 | two | sub_tenant1 > > 3 | three | sub_tenant1 > > (3 rows) > > > > ~~ > > > Thanks for the example. This is more concrete than just verbal description. > > In this example, do all the tables that a subscription subscribes to > need that additional column or somehow the pglogical receiver will > figure out which tables have that column and populate rows > accordingly? > > My further fear is that the subscriber will also need to match the > subscription column along with the rest of PK so as not to update rows > from other subscriptions. > -- > Best Wishes, > Ashutosh Bapat >
Re: Logical Replication Custom Column Expression
On Wed, Nov 23, 2022 at 4:54 AM Peter Smith wrote: > > On Wed, Nov 23, 2022 at 7:38 AM Stavros Koureas > wrote: > > > > Reading more carefully what you described, I think you are interested in > > getting something you call origin from publishers, probably some metadata > > from the publications. > > > > This identifier in those metadata maybe does not have business value on the > > reporting side. The idea is to use a value which has specific meaning to > > the user at the end. > > > > For example assigning 1 for tenant 1, 2 for tenant 2 and so one, at the end > > based on a dimension table which holds this mapping the user would be able > > to filter the data. So programmatically the user can set the id value of > > the column plus creating the mapping table from an application let’s say > > and be able to distinguish the data. > > > > In addition this column should have the ability to be part of the primary > > key on the subscription table in order to not conflict with lines from > > other tenants having the same keys. > > > > > > I was wondering if a simpler syntax solution might also work here. > > Imagine another SUBSCRIPTION parameter that indicates to write the > *name* of the subscription to some pre-defined table column: > e.g. CREATE SUBSCRIPTION subname FOR PUBLICATION pub_tenant_1 > CONNECTION '...' WITH (subscription_column); > > Logical Replication already allows the subscriber table to have extra > columns, so you just need to manually create the extra 'subscription' > column up-front. > > Then... > > ~~ > > On Publisher: > > test_pub=# CREATE TABLE tab(id int primary key, description varchar); > CREATE TABLE > > test_pub=# INSERT INTO tab VALUES (1,'one'),(2,'two'),(3,'three'); > INSERT 0 3 > > test_pub=# CREATE PUBLICATION tenant1 FOR ALL TABLES; > CREATE PUBLICATION > > ~~ > > On Subscriber: > > test_sub=# CREATE TABLE tab(id int, description varchar, subscription > varchar); > CREATE TABLE > > test_sub=# CREATE SUBSCRIPTION sub_tenant1 CONNECTION 'host=localhost > dbname=test_pub' PUBLICATION tenant1 WITH (subscription_column); > CREATE SUBSCRIPTION > > test_sub=# SELECT * FROM tab; > id | description | subscription > +-+-- > 1 | one | sub_tenant1 > 2 | two | sub_tenant1 > 3 | three | sub_tenant1 > (3 rows) > > ~~ > Thanks for the example. This is more concrete than just verbal description. In this example, do all the tables that a subscription subscribes to need that additional column or somehow the pglogical receiver will figure out which tables have that column and populate rows accordingly? My further fear is that the subscriber will also need to match the subscription column along with the rest of PK so as not to update rows from other subscriptions. -- Best Wishes, Ashutosh Bapat
Re: Logical Replication Custom Column Expression
Just one correction for the subscriber On Subscriber: test_sub=# CREATE TABLE tab(id int *pkey*, description varchar, subscription varchar *pkey*); CREATE TABLE The subscription table should have the same primary key columns as the publisher plus one more. We need to make sure that on update only the same origin data is being updated. Στις Τετ 23 Νοε 2022 στις 1:24 π.μ., ο/η Peter Smith έγραψε: > On Wed, Nov 23, 2022 at 7:38 AM Stavros Koureas > wrote: > > > > Reading more carefully what you described, I think you are interested in > getting something you call origin from publishers, probably some metadata > from the publications. > > > > This identifier in those metadata maybe does not have business value on > the reporting side. The idea is to use a value which has specific meaning > to the user at the end. > > > > For example assigning 1 for tenant 1, 2 for tenant 2 and so one, at the > end based on a dimension table which holds this mapping the user would be > able to filter the data. So programmatically the user can set the id value > of the column plus creating the mapping table from an application let’s say > and be able to distinguish the data. > > > > In addition this column should have the ability to be part of the > primary key on the subscription table in order to not conflict with lines > from other tenants having the same keys. > > > > > > I was wondering if a simpler syntax solution might also work here. > > Imagine another SUBSCRIPTION parameter that indicates to write the > *name* of the subscription to some pre-defined table column: > e.g. CREATE SUBSCRIPTION subname FOR PUBLICATION pub_tenant_1 > CONNECTION '...' WITH (subscription_column); > > Logical Replication already allows the subscriber table to have extra > columns, so you just need to manually create the extra 'subscription' > column up-front. > > Then... > > ~~ > > On Publisher: > > test_pub=# CREATE TABLE tab(id int primary key, description varchar); > CREATE TABLE > > test_pub=# INSERT INTO tab VALUES (1,'one'),(2,'two'),(3,'three'); > INSERT 0 3 > > test_pub=# CREATE PUBLICATION tenant1 FOR ALL TABLES; > CREATE PUBLICATION > > ~~ > > On Subscriber: > > test_sub=# CREATE TABLE tab(id int, description varchar, subscription > varchar); > CREATE TABLE > > test_sub=# CREATE SUBSCRIPTION sub_tenant1 CONNECTION 'host=localhost > dbname=test_pub' PUBLICATION tenant1 WITH (subscription_column); > CREATE SUBSCRIPTION > > test_sub=# SELECT * FROM tab; > id | description | subscription > +-+-- > 1 | one | sub_tenant1 > 2 | two | sub_tenant1 > 3 | three | sub_tenant1 > (3 rows) > > ~~ > > Subscriptions to different tenants would be named differently. > > And using other SQL you can map/filter those names however your > application wants. > > -- > Kind Regards, > Peter Smith. > Fujitsu Australia >
Re: Logical Replication Custom Column Expression
It's easy to answer this question. Imagine that in a software company who sells the product and also offers reporting solutions, the ERP tables will not have this additional column to all the tables. Now the reporting department comes and needs to consolidate all that data from different databases (publishers) and create one multitenant database to have all the data. So in an ERP like NAV or anything else you cannot suggest change all the code to all of the tables plus all functions to add one additional column to this table, even that was possible then you cannot work with integers but you need to work with GUIDs as this column should be predefined to each ERP. Then joining with GUID in the second phase for reporting definitely will slow down the performance. In summary: 1. Cannot touch the underlying source (important) 2. GUID identifier column will slow down the reporting performance Στις Τετ 23 Νοε 2022 στις 5:19 π.μ., ο/η Amit Kapila < amit.kapil...@gmail.com> έγραψε: > On Wed, Nov 23, 2022 at 1:40 AM Stavros Koureas > wrote: > > > > Reading more carefully what you described, I think you are interested in > getting something you call origin from publishers, probably some metadata > from the publications. > > > > This identifier in those metadata maybe does not have business value on > the reporting side. The idea is to use a value which has specific meaning > to the user at the end. > > > > For example assigning 1 for tenant 1, 2 for tenant 2 and so one, at the > end based on a dimension table which holds this mapping the user would be > able to filter the data. So programmatically the user can set the id value > of the column plus creating the mapping table from an application let’s say > and be able to distinguish the data. > > > > In your example, are different tenants represent different publisher > nodes? If so, why can't we have a predefined column and value for the > required tables on each publisher rather than logical replication > generate that value while replicating data? > > -- > With Regards, > Amit Kapila. >
Re: Logical Replication Custom Column Expression
On Wed, Nov 23, 2022 at 1:40 AM Stavros Koureas wrote: > > Reading more carefully what you described, I think you are interested in > getting something you call origin from publishers, probably some metadata > from the publications. > > This identifier in those metadata maybe does not have business value on the > reporting side. The idea is to use a value which has specific meaning to the > user at the end. > > For example assigning 1 for tenant 1, 2 for tenant 2 and so one, at the end > based on a dimension table which holds this mapping the user would be able to > filter the data. So programmatically the user can set the id value of the > column plus creating the mapping table from an application let’s say and be > able to distinguish the data. > In your example, are different tenants represent different publisher nodes? If so, why can't we have a predefined column and value for the required tables on each publisher rather than logical replication generate that value while replicating data? -- With Regards, Amit Kapila.
Re: Logical Replication Custom Column Expression
On Tue, Nov 22, 2022 at 6:22 PM Stavros Koureas wrote: > > Sure, this can be implemented as a subscription option, and it will cover > this use case scenario as each subscriber points only to one database. > I also have some more analytical/reporting use-cases which need additions in > logical-replication, I am not sure if I need to open different discussions > for each one, all ideas are for publication/subscription. > I think to some extent it depends on how unique each idea is but initially you may want to post here and then we can spin off different threads for a discussion if required. Are you interested in working on one or more of those ideas to make them reality or do you want others to pick up based on their interest? -- With Regards, Amit Kapila.
Re: Logical Replication Custom Column Expression
On Wed, Nov 23, 2022 at 7:38 AM Stavros Koureas wrote: > > Reading more carefully what you described, I think you are interested in > getting something you call origin from publishers, probably some metadata > from the publications. > > This identifier in those metadata maybe does not have business value on the > reporting side. The idea is to use a value which has specific meaning to the > user at the end. > > For example assigning 1 for tenant 1, 2 for tenant 2 and so one, at the end > based on a dimension table which holds this mapping the user would be able to > filter the data. So programmatically the user can set the id value of the > column plus creating the mapping table from an application let’s say and be > able to distinguish the data. > > In addition this column should have the ability to be part of the primary key > on the subscription table in order to not conflict with lines from other > tenants having the same keys. > > I was wondering if a simpler syntax solution might also work here. Imagine another SUBSCRIPTION parameter that indicates to write the *name* of the subscription to some pre-defined table column: e.g. CREATE SUBSCRIPTION subname FOR PUBLICATION pub_tenant_1 CONNECTION '...' WITH (subscription_column); Logical Replication already allows the subscriber table to have extra columns, so you just need to manually create the extra 'subscription' column up-front. Then... ~~ On Publisher: test_pub=# CREATE TABLE tab(id int primary key, description varchar); CREATE TABLE test_pub=# INSERT INTO tab VALUES (1,'one'),(2,'two'),(3,'three'); INSERT 0 3 test_pub=# CREATE PUBLICATION tenant1 FOR ALL TABLES; CREATE PUBLICATION ~~ On Subscriber: test_sub=# CREATE TABLE tab(id int, description varchar, subscription varchar); CREATE TABLE test_sub=# CREATE SUBSCRIPTION sub_tenant1 CONNECTION 'host=localhost dbname=test_pub' PUBLICATION tenant1 WITH (subscription_column); CREATE SUBSCRIPTION test_sub=# SELECT * FROM tab; id | description | subscription +-+-- 1 | one | sub_tenant1 2 | two | sub_tenant1 3 | three | sub_tenant1 (3 rows) ~~ Subscriptions to different tenants would be named differently. And using other SQL you can map/filter those names however your application wants. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Logical Replication Custom Column Expression
Reading more carefully what you described, I think you are interested in getting something you call origin from publishers, probably some metadata from the publications. This identifier in those metadata maybe does not have business value on the reporting side. The idea is to use a value which has specific meaning to the user at the end. For example assigning 1 for tenant 1, 2 for tenant 2 and so one, at the end based on a dimension table which holds this mapping the user would be able to filter the data. So programmatically the user can set the id value of the column plus creating the mapping table from an application let’s say and be able to distinguish the data. In addition this column should have the ability to be part of the primary key on the subscription table in order to not conflict with lines from other tenants having the same keys. > > 22 Νοε 2022, 14:52, ο χρήστης «Stavros Koureas » > έγραψε: > > > Sure, this can be implemented as a subscription option, and it will cover > this use case scenario as each subscriber points only to one database. > I also have some more analytical/reporting use-cases which need additions in > logical-replication, I am not sure if I need to open different discussions > for each one, all ideas are for publication/subscription. > > Στις Τρί 22 Νοε 2022 στις 2:22 μ.μ., ο/η Amit Kapila > έγραψε: >> On Mon, Nov 21, 2022 at 5:05 PM Ashutosh Bapat >> wrote: >> > >> > On Sat, Nov 19, 2022 at 6:47 PM Stavros Koureas >> > wrote: >> > > >> > > What does not support is the option for defining custom column >> > > expressions, as keys or values, into the upstream (publication). This >> > > will give more flexibility into making replication from multiple >> > > upstreams into less downstreams adding more logic. For instance, in a >> > > project for analytical purposes there is the need to consolidate data >> > > from multiple databases into one and at the same time keep the origin of >> > > each replicated data identified by a tenanant_id column. In this case we >> > > also need the ability to define the new column as an additional key >> > > which will participate into the destination table. >> > > >> > > Tenant 1 table >> > > id serial pk >> > > description varchar >> > > >> > > Tenant 2 table >> > > id integer pk >> > > description varchar >> > > >> > > Group table >> > > tenant integer pk >> > > id integer pk >> > > description varchar >> > > >> > > Possible syntax to archive that >> > > CREATE PUBLICATION pb_test FOR TABLE test ({value:datatype:iskey:alias} >> > > ,"id", "name") >> > > >> > > Example >> > > CREATE PUBLICATION pb_test FOR TABLE test ({1:integer:true:tenant} >> > > ,"id", "name") >> > >> > I think that's a valid usecase. >> > >> > This looks more like a subscription option to me. In multi-subscriber >> > multi-publisher scenarios, on one subscriber a given upstream may be >> > tenant 1 but on some other it could be 2. But I don't think we allow >> > specifying subscription options for a single table. AFAIU, the origin >> > ids are available as part of the commit record which contained this >> > change; that's how conflict resolution is supposed to know it. So >> > somehow the subscriber will need to fetch those from there and set the >> > tenant. >> > >> >> Yeah, to me also it appears that we can handle it on the subscriber >> side. We have the provision of sending origin information in proto.c. >> But note that by default publishers won't have any origin associated >> with change unless someone has defined it. I think this work needs >> more thought but sounds to be an interesting feature. >> >> -- >> With Regards, >> Amit Kapila.
Re: Logical Replication Custom Column Expression
Sure, this can be implemented as a subscription option, and it will cover this use case scenario as each subscriber points only to one database. I also have some more analytical/reporting use-cases which need additions in logical-replication, I am not sure if I need to open different discussions for each one, all ideas are for publication/subscription. Στις Τρί 22 Νοε 2022 στις 2:22 μ.μ., ο/η Amit Kapila < amit.kapil...@gmail.com> έγραψε: > On Mon, Nov 21, 2022 at 5:05 PM Ashutosh Bapat > wrote: > > > > On Sat, Nov 19, 2022 at 6:47 PM Stavros Koureas > > wrote: > > > > > > What does not support is the option for defining custom column > expressions, as keys or values, into the upstream (publication). This will > give more flexibility into making replication from multiple upstreams into > less downstreams adding more logic. For instance, in a project for > analytical purposes there is the need to consolidate data from multiple > databases into one and at the same time keep the origin of each replicated > data identified by a tenanant_id column. In this case we also need the > ability to define the new column as an additional key which will > participate into the destination table. > > > > > > Tenant 1 table > > > id serial pk > > > description varchar > > > > > > Tenant 2 table > > > id integer pk > > > description varchar > > > > > > Group table > > > tenant integer pk > > > id integer pk > > > description varchar > > > > > > Possible syntax to archive that > > > CREATE PUBLICATION pb_test FOR TABLE test > ({value:datatype:iskey:alias} ,"id", "name") > > > > > > Example > > > CREATE PUBLICATION pb_test FOR TABLE test ({1:integer:true:tenant} > ,"id", "name") > > > > I think that's a valid usecase. > > > > This looks more like a subscription option to me. In multi-subscriber > > multi-publisher scenarios, on one subscriber a given upstream may be > > tenant 1 but on some other it could be 2. But I don't think we allow > > specifying subscription options for a single table. AFAIU, the origin > > ids are available as part of the commit record which contained this > > change; that's how conflict resolution is supposed to know it. So > > somehow the subscriber will need to fetch those from there and set the > > tenant. > > > > Yeah, to me also it appears that we can handle it on the subscriber > side. We have the provision of sending origin information in proto.c. > But note that by default publishers won't have any origin associated > with change unless someone has defined it. I think this work needs > more thought but sounds to be an interesting feature. > > -- > With Regards, > Amit Kapila. >
Re: Logical Replication Custom Column Expression
On Mon, Nov 21, 2022 at 5:05 PM Ashutosh Bapat wrote: > > On Sat, Nov 19, 2022 at 6:47 PM Stavros Koureas > wrote: > > > > What does not support is the option for defining custom column expressions, > > as keys or values, into the upstream (publication). This will give more > > flexibility into making replication from multiple upstreams into less > > downstreams adding more logic. For instance, in a project for analytical > > purposes there is the need to consolidate data from multiple databases into > > one and at the same time keep the origin of each replicated data identified > > by a tenanant_id column. In this case we also need the ability to define > > the new column as an additional key which will participate into the > > destination table. > > > > Tenant 1 table > > id serial pk > > description varchar > > > > Tenant 2 table > > id integer pk > > description varchar > > > > Group table > > tenant integer pk > > id integer pk > > description varchar > > > > Possible syntax to archive that > > CREATE PUBLICATION pb_test FOR TABLE test ({value:datatype:iskey:alias} > > ,"id", "name") > > > > Example > > CREATE PUBLICATION pb_test FOR TABLE test ({1:integer:true:tenant} ,"id", > > "name") > > I think that's a valid usecase. > > This looks more like a subscription option to me. In multi-subscriber > multi-publisher scenarios, on one subscriber a given upstream may be > tenant 1 but on some other it could be 2. But I don't think we allow > specifying subscription options for a single table. AFAIU, the origin > ids are available as part of the commit record which contained this > change; that's how conflict resolution is supposed to know it. So > somehow the subscriber will need to fetch those from there and set the > tenant. > Yeah, to me also it appears that we can handle it on the subscriber side. We have the provision of sending origin information in proto.c. But note that by default publishers won't have any origin associated with change unless someone has defined it. I think this work needs more thought but sounds to be an interesting feature. -- With Regards, Amit Kapila.
Re: Logical Replication Custom Column Expression
On Sat, Nov 19, 2022 at 6:47 PM Stavros Koureas wrote: > > Hi all, > > Working with PostgreSQL Logical Replication is just great! It helps a lot > doing real time replication for analytical purposes without using any other > 3d party service. Although all these years working as product architect of > reporting i have noted a few requirements which are always a challenge and > may help enhance logical replication even better. > > To the point: > PostgreSQL14 Logical Replication allows replication of a table to another > table that exists in another database or even in another host. It also allows > multiple upstream tables using the same structure to downstream into a single > table. > CREATE PUBLICATION pb_test FOR TABLE test > > PostgreSQL15 Logical Replication allows even better replication options, like > selecting subsets of the columns from publisher tables. It also supports > plenty of options like disable_on_error etc. > CREATE PUBLICATION pb_test FOR TABLE test ("id", "name") > > What does not support is the option for defining custom column expressions, > as keys or values, into the upstream (publication). This will give more > flexibility into making replication from multiple upstreams into less > downstreams adding more logic. For instance, in a project for analytical > purposes there is the need to consolidate data from multiple databases into > one and at the same time keep the origin of each replicated data identified > by a tenanant_id column. In this case we also need the ability to define the > new column as an additional key which will participate into the destination > table. > > Tenant 1 table > id serial pk > description varchar > > Tenant 2 table > id integer pk > description varchar > > Group table > tenant integer pk > id integer pk > description varchar > > Possible syntax to archive that > CREATE PUBLICATION pb_test FOR TABLE test ({value:datatype:iskey:alias} > ,"id", "name") > > Example > CREATE PUBLICATION pb_test FOR TABLE test ({1:integer:true:tenant} ,"id", > "name") I think that's a valid usecase. This looks more like a subscription option to me. In multi-subscriber multi-publisher scenarios, on one subscriber a given upstream may be tenant 1 but on some other it could be 2. But I don't think we allow specifying subscription options for a single table. AFAIU, the origin ids are available as part of the commit record which contained this change; that's how conflict resolution is supposed to know it. So somehow the subscriber will need to fetch those from there and set the tenant. -- Best Wishes, Ashutosh Bapat