Re: Initial COPY of Logical Replication is too slow

Matheus Alcantara Wed, 25 Feb 2026 13:29:45 -0800

Hi

On Wed Feb 25, 2026 at 4:03 PM -03, Masahiko Sawada wrote:
>> After more investigation of slowness, it seems that the
>> list_concat_unique_oid() called below is quite slow when the database
>> has a lot of tables to publish:
>>
>>     relids = GetPublicationRelations(pub_elem->oid,
>>                                      pub_elem->pubviaroot ?
>>                                      PUBLICATION_PART_ROOT :
>>                                      PUBLICATION_PART_LEAF);
>>     schemarelids = GetAllSchemaPublicationRelations(pub_elem->oid,
>>                                                      pub_elem->pubviaroot ?
>>                                                      PUBLICATION_PART_ROOT :
>>                                                      PUBLICATION_PART_LEAF);
>>     pub_elem_tables = list_concat_unique_oid(relids, schemarelids);
>>
>> This is simply because it's O(n^2), where n is the number of oids in
>> schemarelids in the test case. A simple change would be to do sort &
>> dedup instead. With the attached experimental patch, the
>> pg_get_publication_tables() execution time gets halved in my
>> environment (796ms -> 430ms with 50k tables). If the number of tables
>> is not large, this method might be slower than today but it's not a
>> huge regression.
>>
>> In the initial tablesync cases, it could be optimized further in a way
>> that we introduce a new SQL function that gets the column list and
>> expr of the specific table. This way, we can filter the result by
>> relid at an early stage instead of getting all information and
>> filtering by relid as the tablesync worker does today, avoiding
>> overheads of gathering system catalog scan results.
>
> I've drafted this idea and I find it looks like a better approach. The
> patch introduces the pg_get_publication_table_info() SQL function that
> returns the column list and row filter expression like
> pg_get_publication_tables() returns but it checks only the specific
> table unlike pg_get_publication_tables(). On my env, the tablesync
> worker's query in question becomes 0.6ms from 288 ms with 50k tables
> in one publication. Feedback is very welcome.
>


Thanks for patch. I did a review and here are my comments:

+       values[0] = ObjectIdGetDatum(pub->oid);
+       values[1] = ObjectIdGetDatum(relid);
+
+       values[0] = ObjectIdGetDatum(pub->oid);
+       values[1] = ObjectIdGetDatum(relid);

Duplicated assignments?

--------------

+                 /* ALL TALBES publication */

Typo on TALBES

--------------

+ * Common routine for pg_get_publication_tables() and
+ * pg_get_publication_table_info() to construct the result tuple.
+ */
+static HeapTuple
+construct_published_rel_tuple(published_rel *table_info, TupleDesc tuple_desc)

construct_published_rel_tuple is only being used on
pg_get_publication_table_info(). Perhaps it can also be used on "if
(funcctx->call_cntr < list_length(table_infos))" block on
pg_get_publication_tables()?

--------------

Is new regression tests needed or the current ones already cover the new
function engouth? The code of pg_get_publication_table_info seems well
coveraged.

--------------

It seems that pgindent is missing on src/backend/catalog/pg_publication.c

--
Matheus Alcantara
EDB: https://www.enterprisedb.com

Re: Initial COPY of Logical Replication is too slow

Reply via email to