Hi Peter,
I just tried scenario b that Andres suggested:
For scenario b, I did some testing with row-filter-patch v74 and
various levels of filtering. 0% replicated to 100% rows replicated.
The times are in seconds, I did 5 runs each.
Results:
RUN HEAD "with patch 0%" "row-filter-patch 25%" "row-filter-patch
v74 50%" "row-filter-patch 75%" "row-filter-patch v74 100%"
1 17.26178 12.573736 12.869635 13.742167
17.977112 17.75814
2 17.522473 12.919554 12.640879 14.202737
14.515481 16.961836
3 17.124001 12.640879 12.706631 14.220245
15.686613 17.219355
4 17.24122 12.602345 12.674566 14.219423
15.564312 17.432765
5 17.25352 12.610657 12.689842 14.210725
15.613708 17.403821
As can see the performance seen on HEAD is similar to that which the
patch achieves with all rows (100%) replication. The performance
improves linearly with
more rows filtered.
The test scenario used was:
1. On publisher and subscriber:
CREATE TABLE test (key int, value text, data jsonb, PRIMARY KEY(key, value));
2. On publisher: (based on which scenario is being tested)
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 0); -- 100% allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 250000); -- 75% allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 500000); -- 50% allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 750000); -- 25% allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 1000000); -- 0% allowed
3. On the subscriber:
CREATE SUBSCRIPTION sync_sub CONNECTION 'host=127.0.0.1 port=5432
dbname=postgres application_name=sync_sub' PUBLICATION pub_1;
4. now modify the postgresql.conf on the publisher side
synchronous_standby_names = 'sync_sub' and restart.
5. The test case:
DO
$do$
BEGIN
FOR i IN 1..1000001 BY 10 LOOP
INSERT INTO test VALUES(i,'BAH', row_to_json(row(i)));
UPDATE test SET value = 'FOO' WHERE key = i;
IF I % 1000 = 0 THEN
COMMIT;
END IF;
END LOOP;
END
$do$;
regards,
Ajin Cherian
Fujitsu Australia
On Tue, Feb 1, 2022 at 12:07 PM Peter Smith <[email protected]> wrote:
>
> On Sat, Jan 29, 2022 at 11:31 AM Andres Freund <[email protected]> wrote:
> >
> > Hi,
> >
> > Are there any recent performance evaluations of the overhead of row
> > filters? I
> > think it'd be good to get some numbers comparing:
> >
> > 1) $workload with master
> > 2) $workload with patch, but no row filters
> > 3) $workload with patch, row filter matching everything
> > 4) $workload with patch, row filter matching few rows
> >
> > For workload I think it'd be worth testing:
> > a) bulk COPY/INSERT into one table
> > b) Many transactions doing small modifications to one table
> > c) Many transactions targetting many different tables
> > d) Interspersed DDL + small changes to a table
> >
>
> I have gathered performance data for the workload case (a):
>
> HEAD 46743.75
> v74 no filters 46929.15
> v74 allow 100% 46926.09
> v74 allow 75% 40617.74
> v74 allow 50% 35744.17
> v74 allow 25% 29468.93
> v74 allow 0% 22540.58
>
> PSA.
>
> This was tested using patch v74 and synchronous pub/sub. There are 1M
> INSERTS for publications using differing amounts of row filtering (or
> none).
>
> Observations:
> - There seems insignificant row-filter overheads (e.g. viz no filter
> and 100% allowed versus HEAD).
> - The elapsed time decreases linearly as there is less data getting
> replicated.
>
> I will post the results for other workload kinds (b, c, d) when I have them.
>
> ------
> Kind Regards,
> Peter Smith.
> Fujitsu Australia.