Re: row filtering for logical replication

Ajin Cherian Wed, 02 Feb 2022 01:16:50 -0800

Hi Peter,

I just tried scenario b that Andres suggested:

For scenario b, I did some testing with row-filter-patch v74 and
various levels of filtering. 0% replicated to 100% rows replicated.
The times are in seconds, I did 5 runs each.

Results:

RUN  HEAD     "with patch 0%" "row-filter-patch 25%" "row-filter-patch
v74 50%" "row-filter-patch 75%" "row-filter-patch v74 100%"
1   17.26178  12.573736       12.869635              13.742167
          17.977112              17.75814
2   17.522473 12.919554       12.640879              14.202737
          14.515481              16.961836
3   17.124001 12.640879       12.706631              14.220245
          15.686613              17.219355
4   17.24122  12.602345       12.674566              14.219423
          15.564312              17.432765
5   17.25352  12.610657       12.689842              14.210725
          15.613708              17.403821

As can see the performance seen on HEAD is similar to that which the
patch achieves with all rows (100%) replication. The performance
improves linearly with
more rows filtered.

The test scenario used was:

1. On publisher and subscriber:
CREATE TABLE test (key int, value text, data jsonb, PRIMARY KEY(key, value));

2. On publisher: (based on which scenario is being tested)
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 0); -- 100% allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 250000); -- 75% allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 500000); -- 50% allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 750000); -- 25% allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 1000000); -- 0% allowed

3. On the subscriber:
CREATE SUBSCRIPTION sync_sub CONNECTION 'host=127.0.0.1 port=5432
dbname=postgres application_name=sync_sub' PUBLICATION pub_1;

4. now modify the postgresql.conf on the publisher side
synchronous_standby_names = 'sync_sub' and restart.

5. The test case:

DO
$do$
BEGIN
FOR i IN 1..1000001 BY 10 LOOP
INSERT INTO test VALUES(i,'BAH', row_to_json(row(i)));
UPDATE test SET value = 'FOO' WHERE key = i;
IF I % 1000 = 0 THEN
COMMIT;
END IF;
END LOOP;
END
$do$;

regards,
Ajin Cherian
Fujitsu Australia

On Tue, Feb 1, 2022 at 12:07 PM Peter Smith <[email protected]> wrote:
>
> On Sat, Jan 29, 2022 at 11:31 AM Andres Freund <[email protected]> wrote:
> >
> > Hi,
> >
> > Are there any recent performance evaluations of the overhead of row 
> > filters? I
> > think it'd be good to get some numbers comparing:
> >
> > 1) $workload with master
> > 2) $workload with patch, but no row filters
> > 3) $workload with patch, row filter matching everything
> > 4) $workload with patch, row filter matching few rows
> >
> > For workload I think it'd be worth testing:
> > a) bulk COPY/INSERT into one table
> > b) Many transactions doing small modifications to one table
> > c) Many transactions targetting many different tables
> > d) Interspersed DDL + small changes to a table
> >
>
> I have gathered performance data for the workload case (a):
>
> HEAD 46743.75
> v74 no filters 46929.15
> v74 allow 100% 46926.09
> v74 allow 75% 40617.74
> v74 allow 50% 35744.17
> v74 allow 25% 29468.93
> v74 allow 0% 22540.58
>
> PSA.
>
> This was tested using patch v74 and synchronous pub/sub. There are 1M
> INSERTS for publications using differing amounts of row filtering (or
> none).
>
> Observations:
> - There seems insignificant row-filter overheads (e.g. viz no filter
> and 100% allowed versus HEAD).
> - The elapsed time decreases linearly as there is less data getting 
> replicated.
>
> I will post the results for other workload kinds (b, c, d) when I have them.
>
> ------
> Kind Regards,
> Peter Smith.
> Fujitsu Australia.

Re: row filtering for logical replication

Reply via email to