On Mon, Jul 12, 2021 at 1:09 AM Euler Taveira <eu...@eulerto.com> wrote: > > I did another measure using as baseline the previous patch (v16). > > without cache (v16) > --------------------------- > > mean: 1.46 us > stddev: 2.13 us > median: 1.39 us > min-max: [0.69 .. 1456.69] us > percentile(99): 3.15 us > mode: 0.91 us > > with cache (v18) > ----------------------- > > mean: 0.63 us > stddev: 1.07 us > median: 0.55 us > min-max: [0.29 .. 844.87] us > percentile(99): 1.38 us > mode: 0.41 us > > It represents -57%. It is a really good optimization for just a few extra > lines > of code. >
Good improvement but I think it is better to measure the performance by using synchronous_replication by setting the subscriber as standby_synchronous_names, which will provide the overall saving of time. We can probably see when the timings when no rows are filtered, when 10% rows are filtered when 30% are filtered and so on. I think the way caching has been done in the patch is a bit inefficient. Basically, it always invalidates and rebuilds the expressions even though some unrelated operation has happened on publication. For example, say publication has initially table t1 with rowfilter r1 for which we have cached the state. Now you altered publication and added table t2, it will invalidate the entire state of t1 as well. I think we can avoid that if we invalidate the rowfilter related state only on relcache invalidation i.e in rel_sync_cache_relation_cb and save it the very first time we prepare the expression. In that case, we don't need to do it in advance when preparing relsyncentry, this will have the additional advantage that we won't spend cycles on preparing state unless it is required (for truncate we won't require row_filtering, so it won't be prepared). Few other things, I have noticed: 1. I am seeing tupledesc leak by following below steps: ERROR: tupdesc reference 00000000008D7D18 is not owned by resource owner TopTransaction CONTEXT: slot "tap_sub", output plugin "pgoutput", in the change callback, associated LSN 0/170BD50 Publisher CREATE TABLE tab_rowfilter_1 (a int primary key, b text); CREATE PUBLICATION tap_pub_1 FOR TABLE tab_rowfilter_1 WHERE (a > 1000 AND b <> 'filtered'); Subscriber CREATE TABLE tab_rowfilter_1 (a int primary key, b text); CREATE SUBSCRIPTION tap_sub CONNECTION 'host=localhost port=5432 dbname=postgres' PUBLICATION tap_pub_1; Publisher INSERT INTO tab_rowfilter_1 (a, b) VALUES (1980, 'not filtered'); Alter table tab_rowfilter_1 drop column b cascade; INSERT INTO tab_rowfilter_1 (a) VALUES (1982); 2. postgres=# Alter table tab_rowfilter_1 alter column b set data type varchar; ERROR: unexpected object depending on column: publication of table tab_rowfilter_1 in publication tap_pub_1 I think for this you need to change ATExecAlterColumnType to handle the publication case. -- With Regards, Amit Kapila.