subject:"Re\: Berserk Autovacuum \(let's save next Mandrill\)"

Re: Berserk Autovacuum (let's save next Mandrill)

2020-04-02 Thread Tom Lane

David Rowley  writes:
> On Fri, 3 Apr 2020 at 04:46, Tom Lane  wrote:
>> Concretely, I suggest the attached, which replaces the autovac disables
>> with adjusting partition boundaries so that the partitions contain
>> different numbers of rows.

> I've looked over this and I agree that it's a better solution to the problem.
> I'm happy for you to go ahead on this.

Pushed, thanks for looking at it!

regards, tom lane

Re: Berserk Autovacuum (let's save next Mandrill)

2020-04-02 Thread David Rowley

On Fri, 3 Apr 2020 at 04:46, Tom Lane  wrote:
>
> I wrote:
> > I'd be inclined to undo what you did in favor of initializing the
> > test tables to contain significantly different numbers of rows,
> > because that would (a) achieve plan stability more directly,
> > and (b) demonstrate that the planner is actually ordering the
> > tables by cost correctly.  Maybe somewhere else we have a test
> > that is verifying (b), but these test cases abysmally fail to
> > check that point.
>
> Concretely, I suggest the attached, which replaces the autovac disables
> with adjusting partition boundaries so that the partitions contain
> different numbers of rows.

I've looked over this and I agree that it's a better solution to the problem.

I'm happy for you to go ahead on this.

David

Re: Berserk Autovacuum (let's save next Mandrill)

2020-04-02 Thread Tom Lane

I wrote:
> I'd be inclined to undo what you did in favor of initializing the
> test tables to contain significantly different numbers of rows,
> because that would (a) achieve plan stability more directly,
> and (b) demonstrate that the planner is actually ordering the
> tables by cost correctly.  Maybe somewhere else we have a test
> that is verifying (b), but these test cases abysmally fail to
> check that point.

Concretely, I suggest the attached, which replaces the autovac disables
with adjusting partition boundaries so that the partitions contain
different numbers of rows.

I did not touch the partition boundaries for pagg_tab1 and pagg_tab2,
because that would have required also changing the associated test
queries (which are designed to access only particular partitions).
It seemed like too much work to verify that the answers were still
right, and it's not really necessary because those tables are so
small as to fit in single pages.  That means that autovac will either
see the whole table or none of it, and in either case it won't change
reltuples.

This does cause the order of partitions to change in a couple of the
pagg_tab_ml EXPLAIN results, but I think that's fine; the ordering
does now match the actual sizes of the partitions.

regards, tom lane

diff --git a/src/test/regress/expected/partition_aggregate.out b/src/test/regress/expected/partition_aggregate.out
index d8a6836..a4dc12b 100644
--- a/src/test/regress/expected/partition_aggregate.out
+++ b/src/test/regress/expected/partition_aggregate.out
@@ -2,9 +2,9 @@
 -- PARTITION_AGGREGATE
 -- Test partitionwise aggregation on partitioned tables
 --
--- Note: Various tests located within are sensitive to tables being
--- auto-vacuumed while the tests are running.  For this reason we
--- run autovacuum_enabled = off for all tables.
+-- Note: to ensure plan stability, it's a good idea to make the partitions of
+-- any one partitioned table in this test all have different numbers of rows.
+--
 -- Enable partitionwise aggregate, which by default is disabled.
 SET enable_partitionwise_aggregate TO true;
 -- Enable partitionwise join, which by default is disabled.
@@ -15,9 +15,9 @@ SET max_parallel_workers_per_gather TO 0;
 -- Tests for list partitioned tables.
 --
 CREATE TABLE pagg_tab (a int, b int, c text, d int) PARTITION BY LIST(c);
-CREATE TABLE pagg_tab_p1 PARTITION OF pagg_tab FOR VALUES IN ('', '0001', '0002', '0003') WITH (autovacuum_enabled = off);
-CREATE TABLE pagg_tab_p2 PARTITION OF pagg_tab FOR VALUES IN ('0004', '0005', '0006', '0007') WITH (autovacuum_enabled = off);
-CREATE TABLE pagg_tab_p3 PARTITION OF pagg_tab FOR VALUES IN ('0008', '0009', '0010', '0011') WITH (autovacuum_enabled = off);
+CREATE TABLE pagg_tab_p1 PARTITION OF pagg_tab FOR VALUES IN ('', '0001', '0002', '0003', '0004');
+CREATE TABLE pagg_tab_p2 PARTITION OF pagg_tab FOR VALUES IN ('0005', '0006', '0007', '0008');
+CREATE TABLE pagg_tab_p3 PARTITION OF pagg_tab FOR VALUES IN ('0009', '0010', '0011');
 INSERT INTO pagg_tab SELECT i % 20, i % 30, to_char(i % 12, 'FM'), i % 30 FROM generate_series(0, 2999) i;
 ANALYZE pagg_tab;
 -- When GROUP BY clause matches; full aggregation is performed for each partition.
@@ -400,13 +400,13 @@ SELECT a, sum(b order by a) FROM pagg_tab GROUP BY a ORDER BY 1, 2;
 
 -- JOIN query
 CREATE TABLE pagg_tab1(x int, y int) PARTITION BY RANGE(x);
-CREATE TABLE pagg_tab1_p1 PARTITION OF pagg_tab1 FOR VALUES FROM (0) TO (10) WITH (autovacuum_enabled = off);
-CREATE TABLE pagg_tab1_p2 PARTITION OF pagg_tab1 FOR VALUES FROM (10) TO (20) WITH (autovacuum_enabled = off);
-CREATE TABLE pagg_tab1_p3 PARTITION OF pagg_tab1 FOR VALUES FROM (20) TO (30) WITH (autovacuum_enabled = off);
+CREATE TABLE pagg_tab1_p1 PARTITION OF pagg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE pagg_tab1_p2 PARTITION OF pagg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE pagg_tab1_p3 PARTITION OF pagg_tab1 FOR VALUES FROM (20) TO (30);
 CREATE TABLE pagg_tab2(x int, y int) PARTITION BY RANGE(y);
-CREATE TABLE pagg_tab2_p1 PARTITION OF pagg_tab2 FOR VALUES FROM (0) TO (10) WITH (autovacuum_enabled = off);
-CREATE TABLE pagg_tab2_p2 PARTITION OF pagg_tab2 FOR VALUES FROM (10) TO (20) WITH (autovacuum_enabled = off);
-CREATE TABLE pagg_tab2_p3 PARTITION OF pagg_tab2 FOR VALUES FROM (20) TO (30) WITH (autovacuum_enabled = off);
+CREATE TABLE pagg_tab2_p1 PARTITION OF pagg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE pagg_tab2_p2 PARTITION OF pagg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE pagg_tab2_p3 PARTITION OF pagg_tab2 FOR VALUES FROM (20) TO (30);
 INSERT INTO pagg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i;
 INSERT INTO pagg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i;
 ANALYZE pagg_tab1;
@@ -820,9 +820,9 @@ SELECT a.x, a.y, count(*) FROM (SELECT * FROM pagg_tab1 WHERE x = 1 AND x = 2) a
 
 -- Partition by multiple columns
 CREATE TABLE pagg_tab_m (a int, b int, c int) PARTITION

Re: Berserk Autovacuum (let's save next Mandrill)

2020-04-02 Thread Tom Lane

David Rowley  writes:
> On Thu, 2 Apr 2020 at 16:13, Tom Lane  wrote:
>> Quite :-(.  While it's too early to declare victory, we've seen no
>> more failures of this ilk since 0936d1b6f, so it's sure looking like
>> autovacuum did have something to do with it.

> How about [1]? It seems related to me and also post 0936d1b6f.

That looks much like the first lousyjack failure, which as I said
I wasn't trying to account for at that point.

After looking at those failures, though, I believe that the root cause
may be the same, ie small changes in pg_class.reltuples due to
autovacuum not seeing all pages of the tables.  The test structure
is a bit different, but it is accessing the tables in between EXPLAIN
attempts, so it could be preventing a concurrent autovac from seeing
all pages.

I see your fix at cefb82d49, but it feels a bit brute-force.  Unlike
stats_ext.sql, we're not (supposed to be) dependent on exact planner
estimates in this test.  So I think the real problem here is crappy test
case design.  Namely, that these various sub-tables are exactly the
same size, despite which the test is expecting that the planner will
order them consistently --- with a planning algorithm that prefers
to put larger tables first in parallel appends (cf. create_append_path).
It's not surprising that the result is unstable in the face of small
variations in the rowcount estimates.

I'd be inclined to undo what you did in favor of initializing the
test tables to contain significantly different numbers of rows,
because that would (a) achieve plan stability more directly,
and (b) demonstrate that the planner is actually ordering the
tables by cost correctly.  Maybe somewhere else we have a test
that is verifying (b), but these test cases abysmally fail to
check that point.

I'm not really on board with disabling autovacuum in the regression
tests anywhere we aren't absolutely forced to do so.  It's not
representative of real world practice (or at least not real world
best practice ;-)) and it could help hide actual bugs.  We don't seem
to have much choice with the stats_ext tests as they are constituted,
but those tests look really fragile to me.  Let's not adopt that
technique where we have other possible ways to stabilize test results.

regards, tom lane

Re: Berserk Autovacuum (let's save next Mandrill)

2020-04-02 Thread Tomas Vondra


On Wed, Apr 01, 2020 at 11:13:12PM -0400, Tom Lane wrote:

Dean Rasheed  writes:

Yeah, that makes sense. I still can't see what might be causing those
failures. The tests that were doing an ALTER COLUMN and then expecting
to see the results of a non-analysed table ought to be fixed by
0936d1b6f, but that doesn't match the buildfarm failures. Possibly
0936d1b6f will help with those anyway, but if so, it'll be annoying
not understanding why.


Quite :-(.  While it's too early to declare victory, we've seen no
more failures of this ilk since 0936d1b6f, so it's sure looking like
autovacuum did have something to do with it.

Just to save people repeating the search I did, these are the buildfarm
failures of interest so far:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-03-28%2006%3A33%3A02
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-28%2009%3A20%3A05
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-28%2013%3A20%3A05
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-03-28%2020%3A03%3A03
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grison&dt=2020-03-28%2022%3A00%3A19
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hoverfly&dt=2020-03-29%2006%3A45%3A02
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-30%2002%3A20%3A03
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=xenodermus&dt=2020-03-30%2006%3A00%3A06
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pogona&dt=2020-03-30%2006%3A10%3A05
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pogona&dt=2020-03-30%2023%3A10%3A03
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2020-03-31%2005%3A00%3A35

The first of those is unlike the rest, and I'm not trying to account for
it here.  In the rest, what we see is that sometimes the estimates are off
by a little bit from what's expected, up or down just a percent or two.
And those deltas kick at inconsistent spots partway through a series of
similar tests, so it's hard to deny that *something* asynchronous to the
test script is causing it.

After contemplating the failures for awhile, I have a theory that
at least partially matches the data.  What I think is happening is
that autovacuum (NOT auto-analyze) launches on the table, and since
it is running concurrently with the foreground test script, it fails
to immediately acquire buffer lock on one or more of the table pages.
Since this isn't an aggressive vacuum scan, it just politely backs
off and doesn't scan those pages.  And that translates to not getting
a perfectly accurate reltuples estimate at the end of the vacuum.
On my x86_64 machine, which matches the buildfarm critters having
trouble, the actual contents of both of the troublesome tables will
be 5000 tuples in 31 pages --- which comes out to be 30 pages with
162 tuples each and then 140 tuples in the last page.  Working through
the math in vac_estimate_reltuples (and assuming that the "old" values
were accurate numbers from the test script's own ANALYZE), what I find
is that autovacuum will conclude there are 4999 tuples if it misses
scanning one of the first 30 pages, or 5021 tuples if it misses scanning
the last page, because its interpolation from the old tuple density
figure will underestimate or overestimate the number of missed tuples
accordingly.  Once that slightly-off number gets pushed into pg_class,
we start to get slightly-off rowcount estimates in the test cases.

So what I'm hypothesizing is that the pg_statistic data is perfectly
fine but pg_class.reltuples goes off a little bit after autovacuum.
The percentage changes in reltuples that I predict this way don't
quite square with the percentage changes we see in the overall
rowcount estimates, which is a problem for this theory.  But the test
cases are exercising some fairly complex estimation logic, and it
wouldn't surprise me much if the estimates aren't linearly affected by
reltuples.  (Tomas, do you want to comment further on that point?)



I think this theory makes perfect sense. I think it's much less likely
to see the last page skipped, so we're likely to end up with reltuples
lower than 5000 (as opposed to seeing the 5021). That kinda matches the
reports, where we generally see estimates reduced by 1 or 2. The -1
change could be explained by rounding errors, I guess - with 5000 we
might have produced 139.51, rounded up to 140, a slight drop may get us
139. Not sure about the -2 changes, but I suppose it's possible we might
actually skip multiple pages, reducing the reltuples estimate even more?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Berserk Autovacuum (let's save next Mandrill)

2020-04-01 Thread David Rowley

On Thu, 2 Apr 2020 at 16:13, Tom Lane  wrote:
>
> Dean Rasheed  writes:
> > Yeah, that makes sense. I still can't see what might be causing those
> > failures. The tests that were doing an ALTER COLUMN and then expecting
> > to see the results of a non-analysed table ought to be fixed by
> > 0936d1b6f, but that doesn't match the buildfarm failures. Possibly
> > 0936d1b6f will help with those anyway, but if so, it'll be annoying
> > not understanding why.
>
> Quite :-(.  While it's too early to declare victory, we've seen no
> more failures of this ilk since 0936d1b6f, so it's sure looking like
> autovacuum did have something to do with it.

How about [1]? It seems related to me and also post 0936d1b6f.

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-04-01%2017%3A03%3A05

Re: Berserk Autovacuum (let's save next Mandrill)

2020-04-01 Thread Tom Lane

Dean Rasheed  writes:
> Yeah, that makes sense. I still can't see what might be causing those
> failures. The tests that were doing an ALTER COLUMN and then expecting
> to see the results of a non-analysed table ought to be fixed by
> 0936d1b6f, but that doesn't match the buildfarm failures. Possibly
> 0936d1b6f will help with those anyway, but if so, it'll be annoying
> not understanding why.

Quite :-(.  While it's too early to declare victory, we've seen no
more failures of this ilk since 0936d1b6f, so it's sure looking like
autovacuum did have something to do with it.

Just to save people repeating the search I did, these are the buildfarm
failures of interest so far:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-03-28%2006%3A33%3A02
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-28%2009%3A20%3A05
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-28%2013%3A20%3A05
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-03-28%2020%3A03%3A03
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grison&dt=2020-03-28%2022%3A00%3A19
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hoverfly&dt=2020-03-29%2006%3A45%3A02
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-30%2002%3A20%3A03
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=xenodermus&dt=2020-03-30%2006%3A00%3A06
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pogona&dt=2020-03-30%2006%3A10%3A05
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pogona&dt=2020-03-30%2023%3A10%3A03
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2020-03-31%2005%3A00%3A35

The first of those is unlike the rest, and I'm not trying to account for
it here.  In the rest, what we see is that sometimes the estimates are off
by a little bit from what's expected, up or down just a percent or two.
And those deltas kick at inconsistent spots partway through a series of
similar tests, so it's hard to deny that *something* asynchronous to the
test script is causing it.

After contemplating the failures for awhile, I have a theory that
at least partially matches the data.  What I think is happening is
that autovacuum (NOT auto-analyze) launches on the table, and since
it is running concurrently with the foreground test script, it fails
to immediately acquire buffer lock on one or more of the table pages.
Since this isn't an aggressive vacuum scan, it just politely backs
off and doesn't scan those pages.  And that translates to not getting
a perfectly accurate reltuples estimate at the end of the vacuum.
On my x86_64 machine, which matches the buildfarm critters having
trouble, the actual contents of both of the troublesome tables will
be 5000 tuples in 31 pages --- which comes out to be 30 pages with
162 tuples each and then 140 tuples in the last page.  Working through
the math in vac_estimate_reltuples (and assuming that the "old" values
were accurate numbers from the test script's own ANALYZE), what I find
is that autovacuum will conclude there are 4999 tuples if it misses
scanning one of the first 30 pages, or 5021 tuples if it misses scanning
the last page, because its interpolation from the old tuple density
figure will underestimate or overestimate the number of missed tuples
accordingly.  Once that slightly-off number gets pushed into pg_class,
we start to get slightly-off rowcount estimates in the test cases.

So what I'm hypothesizing is that the pg_statistic data is perfectly
fine but pg_class.reltuples goes off a little bit after autovacuum.
The percentage changes in reltuples that I predict this way don't
quite square with the percentage changes we see in the overall
rowcount estimates, which is a problem for this theory.  But the test
cases are exercising some fairly complex estimation logic, and it
wouldn't surprise me much if the estimates aren't linearly affected by
reltuples.  (Tomas, do you want to comment further on that point?)

regards, tom lane

Re: Berserk Autovacuum (let's save next Mandrill)

2020-04-01 Thread Dean Rasheed

On Tue, 31 Mar 2020 at 22:16, Tom Lane  wrote:
>
> > Dean Rasheed  writes:
> >> ...
> >> It looks to me as though the problem is that statext_store() needs to
> >> take its lock on pg_statistic_ext_data *before* searching for the
> >> stats tuple to update.
>
> > Hmm, yeah, that seems like clearly a bad idea.
>
> I pushed a fix for that

Thanks for doing that (looks like it was my mistake originally).

> but I think it must be unrelated to the
> buildfarm failures we're seeing.  For that coding to be a problem,
> it would have to run concurrently with a VACUUM FULL or CLUSTER
> on pg_statistic_ext_data (which would give all the tuples new TIDs).
> AFAICS that won't happen with the tests that are giving trouble.
>

Yeah, that makes sense. I still can't see what might be causing those
failures. The tests that were doing an ALTER COLUMN and then expecting
to see the results of a non-analysed table ought to be fixed by
0936d1b6f, but that doesn't match the buildfarm failures. Possibly
0936d1b6f will help with those anyway, but if so, it'll be annoying
not understanding why.

Regards,
Dean

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-31 Thread Tom Lane

I wrote:
> Dean Rasheed  writes:
>> I had a go at reproducing this. I wasn't able to produce the reported
>> failure, but I can reliably produce an Assert failure that may be
>> related by doing a VACUUM FULL simultaneously with an ANALYZE that is
>> generating extended stats, which produces:
>> ...
>> It looks to me as though the problem is that statext_store() needs to
>> take its lock on pg_statistic_ext_data *before* searching for the
>> stats tuple to update.

> Hmm, yeah, that seems like clearly a bad idea.

I pushed a fix for that, but I think it must be unrelated to the
buildfarm failures we're seeing.  For that coding to be a problem,
it would have to run concurrently with a VACUUM FULL or CLUSTER
on pg_statistic_ext_data (which would give all the tuples new TIDs).
AFAICS that won't happen with the tests that are giving trouble.

regards, tom lane

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-31 Thread Tom Lane

Dean Rasheed  writes:
> I had a go at reproducing this. I wasn't able to produce the reported
> failure, but I can reliably produce an Assert failure that may be
> related by doing a VACUUM FULL simultaneously with an ANALYZE that is
> generating extended stats, which produces:
> ...
> It looks to me as though the problem is that statext_store() needs to
> take its lock on pg_statistic_ext_data *before* searching for the
> stats tuple to update.

Hmm, yeah, that seems like clearly a bad idea.

> It's late here, so I haven't worked up a patch yet, but it looks
> pretty straightforward.

I can take care of it.

regards, tom lane

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-31 Thread Dean Rasheed

On Tue, 31 Mar 2020 at 04:39, David Rowley  wrote:
>
> On Sat, 28 Mar 2020 at 22:22, David Rowley  wrote:
> > I'm unsure yet if this has caused an instability on lousyjack's run in
> > [1].
>
> pogona has just joined in on the fun [1], so, we're not out the woods
> on this yet. I'll start having a look at this in more detail.
>
> [1] 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pogona&dt=2020-03-30%2023%3A10%3A03
>

I had a go at reproducing this. I wasn't able to produce the reported
failure, but I can reliably produce an Assert failure that may be
related by doing a VACUUM FULL simultaneously with an ANALYZE that is
generating extended stats, which produces:

#0  0x7f28081c9520 in raise () from /lib64/libc.so.6
#1  0x7f28081cab01 in abort () from /lib64/libc.so.6
#2  0x00aad1ad in ExceptionalCondition (conditionName=0xb2f1a1
"ItemIdIsNormal(lp)", errorType=0xb2e7c9 "FailedAssertion",
fileName=0xb2e848 "heapam.c", lineNumber=3016) at assert.c:67
#3  0x004fb79e in heap_update (relation=0x7f27feebeda8,
otid=0x2d881fc, newtup=0x2d881f8, cid=0, crosscheck=0x0, wait=true,
tmfd=0x7ffc568a5900, lockmode=0x7ffc568a58fc) at heapam.c:3016
#4  0x004fdead in simple_heap_update (relation=0x7f27feebeda8,
otid=0x2d881fc, tup=0x2d881f8) at heapam.c:3902
#5  0x005be860 in CatalogTupleUpdate (heapRel=0x7f27feebeda8,
otid=0x2d881fc, tup=0x2d881f8) at indexing.c:230
#6  0x008df898 in statext_store (statOid=18964, ndistinct=0x0,
dependencies=0x2a85fe0, mcv=0x0, stats=0x2a86570) at
extended_stats.c:553
#7  0x008deec0 in BuildRelationExtStatistics
(onerel=0x7f27feed9008, totalrows=5000, numrows=5000, rows=0x2ad5a30,
natts=7, vacattrstats=0x2a75f40) at extended_stats.c:187
#8  0x0065c1b7 in do_analyze_rel (onerel=0x7f27feed9008,
params=0x7ffc568a5fc0, va_cols=0x0, acquirefunc=0x65ce37
, relpages=31, inh=false, in_outer_xact=false,
elevel=13) at analyze.c:606
#9  0x0065b532 in analyze_rel (relid=18956,
relation=0x29b0bc0, params=0x7ffc568a5fc0, va_cols=0x0,
in_outer_xact=false, bstrategy=0x2a7dfa0) at analyze.c:263
#10 0x006fd768 in vacuum (relations=0x2a7e148,
params=0x7ffc568a5fc0, bstrategy=0x2a7dfa0, isTopLevel=true) at
vacuum.c:468
#11 0x006fd22c in ExecVacuum (pstate=0x2a57a00,
vacstmt=0x29b0ca8, isTopLevel=true) at vacuum.c:251

It looks to me as though the problem is that statext_store() needs to
take its lock on pg_statistic_ext_data *before* searching for the
stats tuple to update.

It's late here, so I haven't worked up a patch yet, but it looks
pretty straightforward.

Regards,
Dean

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-30 Thread David Rowley

On Sat, 28 Mar 2020 at 22:22, David Rowley  wrote:
> I'm unsure yet if this has caused an instability on lousyjack's run in
> [1].

pogona has just joined in on the fun [1], so, we're not out the woods
on this yet. I'll start having a look at this in more detail.

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pogona&dt=2020-03-30%2023%3A10%3A03

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-30 Thread David Rowley

On Mon, 30 Mar 2020 at 19:49, David Rowley  wrote:
> I'll see if I can come up with some way to do this in a more
> deterministic way to determine which tables to add vacuums for, rather
> than waiting for and reacting post-failure.

I ended up running make installcheck on an instance with
autovacuum_naptime set to 1s with a small additional debug line in
autovacuum.c, namely:

diff --git a/src/backend/postmaster/autovacuum.c
b/src/backend/postmaster/autovacuum.c
index 7e97ffab27..ad81e321dc 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -3099,6 +3099,9 @@ relation_needs_vacanalyze(Oid relid,
*dovacuum = force_vacuum || (vactuples > vacthresh) ||
(vac_ins_base_thresh >= 0 &&
instuples > vacinsthresh);
*doanalyze = (anltuples > anlthresh);
+
+   if (vac_ins_base_thresh >= 0 && instuples > vacinsthresh)
+   elog(LOG, " %s", NameStr(classForm->relname));
}
else
{

I grepped the log after the installcheck to grab the table names that
saw an insert vacuum during the test then grepped the test output to
see if the table appears to pose a risk of test instability.

I've classed each table with a risk factor. "VeryLow" seems like
there's almost no risk because we don't ever look at EXPLAIN.  Low
risk tables look at EXPLAIN, but I feel are not quite looking in
enough detail to cause issues. Medium risk look at EXPLAIN and I feel
there's a risk of some change, I think these are all Append nodes
which do order subnodes based on their cost. High risk those are
the ones I'm about to look into changing.

The full results of my analysis are:

Table: agg_group_1 aggregates.out. Nothing looks at EXPLAIN. Risk:VeryLow
Table: agg_hash_1 aggregates.out. Nothing looks at EXPLAIN. Risk:VeryLow
Table: atest12 privileges.out. Lots of looking at EXPLAIN, but nothing
appears to look into row estimates in detail. Risk:Low
Table: brin_test brin.out.  Test already does VACUUM ANALYZE. Risk:VeryLow
Table: bt_f8_heap btree_index.out, create_index.out. Rows loaded in
copy.source. Nothing appears to look at EXPLAIN. Risk:VeryLow
Table: bt_i4_heap btree_index.out, create_index.out. Rows loaded in
copy.source. Nothing appears to look at EXPLAIN. Risk:VeryLow
Table: bt_name_heap btree_index.out, create_index.out. Rows loaded in
copy.source. Nothing appears to look at EXPLAIN. Risk:VeryLow
Table: bt_txt_heap btree_index.out, create_index.out. Rows loaded in
copy.source. Nothing appears to look at EXPLAIN. Risk:VeryLow
Table: dupindexcols create_index.out. Some looking at EXPLAIN plans,
but nothing appears to look into row estimates in detail. Risk:Low
Table: fast_emp4000 create_am.out, create_index.out, create_misc.out.
Lots of looking at EXPLAIN, but nothing appears to look into row
estimates in detail. Risk:Low
Table: functional_dependencies stats_ext.out. Lots of looking at
EXPLAIN output. Test looks at row estimates. Risk:High
Table: gist_tbl gist.out. Lots of looking at EXPLAIN, but nothing
appears to look into row estimates in detail. Risk:Low
Table: hash_f8_heap hash_index.out. Rows loaded in copy.source.
Nothing appears to look at EXPLAIN. Risk:VeryLow
Table: hash_i4_heap hash_index.out. Rows loaded in copy.source.
Nothing appears to look at EXPLAIN. Risk:VeryLow
Table: hash_name_heap hash_index.out. Rows loaded in copy.source.
Nothing appears to look at EXPLAIN. Risk:VeryLow
Table: hash_txt_heap hash_index.out. Rows loaded in copy.source.
Nothing appears to look at EXPLAIN. Risk:VeryLow
Table: kd_point_tbl create_index_spgist.out. Lots of looking at
EXPLAIN, but nothing appears to look into row estimates in detail.
Risk:Low
Table: mcv_lists stats_ext.out. Lots of looking at EXPLAIN, but tests
appear to VACUUM after loading rows. Risk:Low
Table: mcv_lists_arrays stats_ext.out. Nothing appears to look at
EXPLAIN. Risk:VeryLow
Table: mcv_lists_bool stats_ext.out.  Lots of looking at EXPLAIN
output. Test looks at row estimates. Risk:High
Table: ndistinct stats_ext.out.  Lots of looking at EXPLAIN output.
Test looks at row estimates. Only 1000 rows are loaded initially and
then 5000 after a truncate. 1000 rows won't trigger the auto-vacuum.
Risk:High
Table: onek Lots of files. Sees a VACUUM in sanity_check test,
however, some tests run before sanity_check, e.g. create_index,
select, copy, none of which appear to pay particular attention to
anything vacuum might change. Risk:Low
Table: pagg_tab_ml_p2_s1 partition_aggregate.out Appears to be some
risk of Append reordering partitions based on cost. Risk:Medium
Table: pagg_tab_ml_p2_s2 partition_aggregate.out Appears to be some
risk of Append reordering partitions based on cost. Risk:Medium
Table: pagg_tab_ml_p3_s1 partition_aggregate.out Appears to be some
risk of Append reordering partitions based on cost. Risk:Medium
Table: pagg_tab_ml_p3_s2 partition_aggregate.out Appears to be some
risk of Append reorder

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-29 Thread David Rowley

On Mon, 30 Mar 2020 at 17:57, Laurenz Albe  wrote:
> How can it be that even after an explicit VACUUM, this patch can cause
> unstable regression test results?

I only added vacuums for mcv_lists. The problem with petalura [1] is
with the functional_dependencies table.

I'll see if I can come up with some way to do this in a more
deterministic way to determine which tables to add vacuums for, rather
than waiting for and reacting post-failure.

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-30%2002%3A20%3A03

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-29 Thread Laurenz Albe

On Sat, 2020-03-28 at 19:21 +1300, David Rowley wrote:
> Thank you.  Pushed.

Thanks for your efforts on this, and thanks for working on the fallout.

How can it be that even after an explicit VACUUM, this patch can cause
unstable regression test results?

Yours,
Laurenz Albe

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-29 Thread Tom Lane

Amit Kapila  writes:
> On Mon, Mar 30, 2020 at 7:47 AM Tom Lane  wrote:
>> But the ones that were seemingly due to that were intermittent,
>> so we'll have to watch for awhile.

> Today, stats_ext failed on petalura [1].  Can it be due to this?  I
> have also committed a patch but immediately I don't see it to be
> related to my commit.

Yeah, this looks just like petalura's previous failures, so the
problem is still there :-(

regards, tom lane

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-29 Thread Amit Kapila

On Mon, Mar 30, 2020 at 7:47 AM Tom Lane  wrote:
>
> David Rowley  writes:
> > I don't believe any of the current buildfarm failures can be
> > attributed to any of the recent changes to autovacuum, but I'll
> > continue to monitor the farm to see if anything is suspect.
>
> I agree none of the failures I see right now are related to that
> (there's some "No space left on device" failures, Windows randomicity,
> snapper's compiler bug, and don't-know-what on hyrax).
>
> But the ones that were seemingly due to that were intermittent,
> so we'll have to watch for awhile.
>

Today, stats_ext failed on petalura [1].  Can it be due to this?  I
have also committed a patch but immediately I don't see it to be
related to my commit.

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-30%2002%3A20%3A03

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-29 Thread Tom Lane

David Rowley  writes:
> I don't believe any of the current buildfarm failures can be
> attributed to any of the recent changes to autovacuum, but I'll
> continue to monitor the farm to see if anything is suspect.

I agree none of the failures I see right now are related to that
(there's some "No space left on device" failures, Windows randomicity,
snapper's compiler bug, and don't-know-what on hyrax).

But the ones that were seemingly due to that were intermittent,
so we'll have to watch for awhile.

regards, tom lane

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-29 Thread David Rowley

On Sun, 29 Mar 2020 at 15:29, David Rowley  wrote:
> I'm considering pushing the attached to try to get some confirmation
> that additional autovacuums are the issue. However, I'm not too sure
> it's a wise idea to as I can trigger an additional auto-vacuum and
> have these new tests fail with make installcheck after setting
> autovacuum_naptime to 1s, but I'm not getting the other diffs
> experienced by lousyjack and petalura.  The patch may just cause more
> failures without proving much, especially so with slower machines.

Instead of the above, I ended up modifying the two intermittently
failing tests to change the ANALYZE into a VACUUM ANALYZE.  This
should prevent autovacuum sneaking in a vacuum at some point in time
after the ANALYZE has taken place.

I don't believe any of the current buildfarm failures can be
attributed to any of the recent changes to autovacuum, but I'll
continue to monitor the farm to see if anything is suspect.

David

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-28 Thread David Rowley

On Sun, 29 Mar 2020 at 10:30, David Rowley  wrote:
>
> On Sun, 29 Mar 2020 at 06:26, Tom Lane  wrote:
> >
> > David Rowley  writes:
> > > On Sat, 28 Mar 2020 at 17:12, Laurenz Albe  
> > > wrote:
> > >> In the light of that, I have no objections.
> >
> > > Thank you.  Pushed.
> >
> > It seems like this commit has resulted in some buildfarm instability:
> >
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-03-28%2006%3A33%3A02
> >
> > apparent change of plan
> >
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-28%2009%3A20%3A05
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-28%2013%3A20%3A05
> >
> > unstable results in stats_ext test
>
> Yeah, thanks for pointing that out.  I'm just doing some tests locally
> to see if I can recreate those results after vacuuming the mcv_list
> table, so far I'm unable to.

I'm considering pushing the attached to try to get some confirmation
that additional autovacuums are the issue. However, I'm not too sure
it's a wise idea to as I can trigger an additional auto-vacuum and
have these new tests fail with make installcheck after setting
autovacuum_naptime to 1s, but I'm not getting the other diffs
experienced by lousyjack and petalura.  The patch may just cause more
failures without proving much, especially so with slower machines.

The other idea I had was just to change the
autovacuum_vacuum_insert_threshold relopt to -1 for the problem tables
and see if that stabilises things.

Yet another option would be to see if reltuples varies between runs
and ditch the autovacuum_count column from the attached.  There does
not appear to be any part of the tests which would cause any dead
tuples in any of the affected relations, so I'm unsure why reltuples
would vary between what ANALYZE and VACUUM would set it to.

I'm still thinking. Input welcome.

David

temp_telemetry_checks_for_unstable_tests.patch
Description: Binary data

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-28 Thread David Rowley

On Sun, 29 Mar 2020 at 06:26, Tom Lane  wrote:
>
> David Rowley  writes:
> > On Sat, 28 Mar 2020 at 17:12, Laurenz Albe  wrote:
> >> In the light of that, I have no objections.
>
> > Thank you.  Pushed.
>
> It seems like this commit has resulted in some buildfarm instability:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-03-28%2006%3A33%3A02
>
> apparent change of plan
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-28%2009%3A20%3A05
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-28%2013%3A20%3A05
>
> unstable results in stats_ext test

Yeah, thanks for pointing that out.  I'm just doing some tests locally
to see if I can recreate those results after vacuuming the mcv_list
table, so far I'm unable to.

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-28 Thread Tom Lane

David Rowley  writes:
> On Sat, 28 Mar 2020 at 17:12, Laurenz Albe  wrote:
>> In the light of that, I have no objections.

> Thank you.  Pushed.

It seems like this commit has resulted in some buildfarm instability:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-03-28%2006%3A33%3A02

apparent change of plan

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-28%2009%3A20%3A05
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=petalura&dt=2020-03-28%2013%3A20%3A05

unstable results in stats_ext test

I initially thought that Dean's functional-stats adjustment might be
the culprit, but the timestamps on these failures disprove that.

regards, tom lane

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-28 Thread David Rowley

On Sat, 28 Mar 2020 at 19:21, David Rowley  wrote:
> Thank you.  Pushed.

I'm unsure yet if this has caused an instability on lousyjack's run in
[1]. I see that table does have 30,000 rows inserted, so it does seem
probable that it may receive an autovacuum now when didn't before. I
did a quick local test to see if swapping the "ANALYZE pagg_tab_ml;"
to "VACUUM ANALYZE pagg_tab_ml;" would do the same on my local
machine, but it didn't.

I'll keep an eye on lousyjack's next run.  If it passes next run, I
may add some SQL to determine if pg_stat_all_tables.autovacuum_count
for those tables are varying between passing and failing runs.

David

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&dt=2020-03-28%2006%3A33%3A02

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-27 Thread David Rowley

On Sat, 28 Mar 2020 at 17:12, Laurenz Albe  wrote:
> In the light of that, I have no objections.

Thank you.  Pushed.

David

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-27 Thread Laurenz Albe

On Sat, 2020-03-28 at 11:59 +1300, David Rowley wrote:
> On Fri, 27 Mar 2020 at 22:40, Laurenz Albe  wrote:
> > The new meaning of -2 should be documented, other than that it looks
> > good to me.
> 
> But the users don't need to know anything about -2. It's not possible
> to explicitly set the value to -2. This is just the reset value of the
> reloption which means "use the GUC".

I see.

> > I'll accept the new semantics, but they don't make me happy.  People are
> > used to -1 meaning "use the GUC value instead".
> 
> The problem with having -1 on the reloption meaning use the GUC, in
> this case, is that it means the reset value of the reloption must be
> -1 and we need to allow them to set -2 explicitly, and if we do that,
> then -1 also becomes a valid value that users can set. Maybe that's
> not the end of the world, but I'd rather have the reset value be
> unsettable by users. To me, that's less confusing as there are fewer
> special values to remember the meaning of.
> 
> The reason I want a method to explicitly disable the feature is the
> fact that it's easy to document and it should reduce the number of
> people who are confused about the best method to disable the feature.
> I know there's going to be a non-zero number of people who'll want to
> do that.

In the light of that, I have no objections.

Yours,
Laurenz Albe

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-27 Thread David Rowley

On Fri, 27 Mar 2020 at 22:40, Laurenz Albe  wrote:
> The new meaning of -2 should be documented, other than that it looks
> good to me.

But the users don't need to know anything about -2. It's not possible
to explicitly set the value to -2. This is just the reset value of the
reloption which means "use the GUC".

> I'll accept the new semantics, but they don't make me happy.  People are
> used to -1 meaning "use the GUC value instead".

The problem with having -1 on the reloption meaning use the GUC, in
this case, is that it means the reset value of the reloption must be
-1 and we need to allow them to set -2 explicitly, and if we do that,
then -1 also becomes a valid value that users can set. Maybe that's
not the end of the world, but I'd rather have the reset value be
unsettable by users. To me, that's less confusing as there are fewer
special values to remember the meaning of.

The reason I want a method to explicitly disable the feature is the
fact that it's easy to document and it should reduce the number of
people who are confused about the best method to disable the feature.
I know there's going to be a non-zero number of people who'll want to
do that.

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-27 Thread Laurenz Albe

On Fri, 2020-03-27 at 10:18 +1300, David Rowley wrote:
> > > I believe there are enough options to disable insert-only vacuuming for
> > > an individual table:
> >
> > > - Set the threshold to 2147483647.  True, that will not work for very
> > >   large tables, but I think that there are few tables that insert that
> > >   many rows before they hit autovacuum_freeze_max_age anyway.
> > >
> > > - Set the scale factor to some astronomical value.
> >
> > Meh. You *are* adding new semantics with these. And they're terrible.
> 
> I've modified this to allow a proper way to disable the entire feature
> by allowing the setting to be set to -1 to disable the feature. I feel
> people are fairly used to using -1 to disable various features (e.g.
> log_autovacuum_min_duration).  I've used the special value of -2 for
> the reloption to have that cascade to using the GUC instead.  The
> autovacuum_vacuum_insert_threshold reloption may be explicitly set to
> -1 to disable autovacuums for inserts for the relation.
> 
> I've also knocked the default threshold down to 1000. I feel this is a
> better value given that the scale factor is now 0.2. There seemed to
> be no need to exclude smaller tables from seeing gains such as
> index-only scans.
> 
> If nobody objects, I plan to push this one shortly.

Thanks for the help!

The new meaning of -2 should be documented, other than that it looks
good to me.

I'll accept the new semantics, but they don't make me happy.  People are
used to -1 meaning "use the GUC value instead".

I still don't see why we need that.  Contrary to Andres' opinion, I don't
think that disabling a parameter by setting it to a value high enough that
it does not take effect is a bad thing.

I won't put up a fight though.

Yours,
Laurenz Albe

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-26 Thread David Rowley

On Fri, 27 Mar 2020 at 07:51, Andres Freund  wrote:
>
> Hi,
>
> On 2020-03-26 10:12:39 +0100, Laurenz Albe wrote:
> > On Wed, 2020-03-25 at 23:19 +0300, Alexander Korotkov wrote:
> > I am reluctant to introduce new semantics like a reloption value of -2
> > to disable a feature in this patch right before feature freeze.
> >
> > I believe there are enough options to disable insert-only vacuuming for
> > an individual table:
>
> > - Set the threshold to 2147483647.  True, that will not work for very
> >   large tables, but I think that there are few tables that insert that
> >   many rows before they hit autovacuum_freeze_max_age anyway.
> >
> > - Set the scale factor to some astronomical value.
>
> Meh. You *are* adding new semantics with these. And they're terrible.

I've modified this to allow a proper way to disable the entire feature
by allowing the setting to be set to -1 to disable the feature. I feel
people are fairly used to using -1 to disable various features (e.g.
log_autovacuum_min_duration).  I've used the special value of -2 for
the reloption to have that cascade to using the GUC instead.  The
autovacuum_vacuum_insert_threshold reloption may be explicitly set to
-1 to disable autovacuums for inserts for the relation.

I've also knocked the default threshold down to 1000. I feel this is a
better value given that the scale factor is now 0.2. There seemed to
be no need to exclude smaller tables from seeing gains such as
index-only scans.

If nobody objects, I plan to push this one shortly.

David


insert_autovac_v12.patch
Description: Binary data

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-26 Thread Andres Freund

Hi,

On 2020-03-26 10:12:39 +0100, Laurenz Albe wrote:
> On Wed, 2020-03-25 at 23:19 +0300, Alexander Korotkov wrote:
> I am reluctant to introduce new semantics like a reloption value of -2
> to disable a feature in this patch right before feature freeze.
> 
> I believe there are enough options to disable insert-only vacuuming for
> an individual table:

> - Set the threshold to 2147483647.  True, that will not work for very
>   large tables, but I think that there are few tables that insert that
>   many rows before they hit autovacuum_freeze_max_age anyway.
> 
> - Set the scale factor to some astronomical value.

Meh. You *are* adding new semantics with these. And they're terrible.

Greetings,

Andres Freund

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-26 Thread Laurenz Albe

On Wed, 2020-03-25 at 23:19 +0300, Alexander Korotkov wrote:
> On Wed, Mar 25, 2020 at 10:26 PM Andres Freund  wrote:
> > On 2020-03-25 11:05:21 -0500, Justin Pryzby wrote:
> > > Since we talked about how scale_factor can be used to effectively disable 
> > > this
> > > new feature, I thought that scale=100 was too small and suggesed 1e10 
> > > (same as
> > > max for vacuum_cleanup_index_scale_factor since 4d54543ef).  That should 
> > > allow
> > > handling the case that analyze is disabled, or its threshold is high, or 
> > > it
> > > hasn't run yet, or it's running but hasn't finished, or analyze is 
> > > triggered as
> > > same time as vacuum.
> > 
> > For disabling we instead should allow -1, and disable the feature if set
> > to < 0.
> 
> This patch introduces both GUC and reloption.  In reloptions we
> typically use -1 for "disable reloption, use GUC value instead"
> semantics.  So it's unclear how should we allow reloption to both
> disable feature and disable reloption.  I think we don't have a
> precedent in the codebase yet.  We could allow -2 (disable reloption)
> and -1 (disable feature) for reloption.  Opinions?

Here is patch v11, where the reloption has the same upper limit 1e10
as the GUC.  There is no good reason to have them different.

I am reluctant to introduce new semantics like a reloption value of -2
to disable a feature in this patch right before feature freeze.

I believe there are enough options to disable insert-only vacuuming for
an individual table:

- Set the threshold to 2147483647.  True, that will not work for very
  large tables, but I think that there are few tables that insert that
  many rows before they hit autovacuum_freeze_max_age anyway.

- Set the scale factor to some astronomical value.

Yours,
Laurenz Albe

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-25 Thread Alexander Korotkov

On Wed, Mar 25, 2020 at 10:26 PM Andres Freund  wrote:
> On 2020-03-25 11:05:21 -0500, Justin Pryzby wrote:
> > Since we talked about how scale_factor can be used to effectively disable 
> > this
> > new feature, I thought that scale=100 was too small and suggesed 1e10 (same 
> > as
> > max for vacuum_cleanup_index_scale_factor since 4d54543ef).  That should 
> > allow
> > handling the case that analyze is disabled, or its threshold is high, or it
> > hasn't run yet, or it's running but hasn't finished, or analyze is 
> > triggered as
> > same time as vacuum.
>
> For disabling we instead should allow -1, and disable the feature if set
> to < 0.

This patch introduces both GUC and reloption.  In reloptions we
typically use -1 for "disable reloption, use GUC value instead"
semantics.  So it's unclear how should we allow reloption to both
disable feature and disable reloption.  I think we don't have a
precedent in the codebase yet.  We could allow -2 (disable reloption)
and -1 (disable feature) for reloption.  Opinions?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-25 Thread Andres Freund

Hi,

On 2020-03-25 11:05:21 -0500, Justin Pryzby wrote:
> Since we talked about how scale_factor can be used to effectively disable this
> new feature, I thought that scale=100 was too small and suggesed 1e10 (same as
> max for vacuum_cleanup_index_scale_factor since 4d54543ef).  That should allow
> handling the case that analyze is disabled, or its threshold is high, or it
> hasn't run yet, or it's running but hasn't finished, or analyze is triggered 
> as
> same time as vacuum.

For disabling we instead should allow -1, and disable the feature if set
to < 0.

Greetings,

Andres Freund

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-25 Thread Justin Pryzby

On Wed, Mar 25, 2020 at 12:46:52PM -0300, Alvaro Herrera wrote:
> On 2020-Mar-25, Justin Pryzby wrote:
> 
> > Maybe in the docs you can write this with thousands separators: 10,000,000
> > 
> > It looks like the GUC uses scale factor max=1e10, but the relopt is still
> > max=100, which means it's less possible to disable for a single rel.
> 
> I have paid no attention to this thread, but how does it make sense to
> have a scale factor to be higher than 100?  Surely you mean the
> threshold value that should be set to ten million, not the scale factor?

We went over this here:
https://www.postgresql.org/message-id/20200317195616.GZ26184%40telsasoft.com
...
https://www.postgresql.org/message-id/20200317213426.GB26184%40telsasoft.com

The scale factor is relative to the reltuples estimate, which comes from vacuum
(which presently doesn't run against insert-only tables, and what we're trying
to schedule), or analyze, which probably runs adequately, but might be disabled
or run too infrequently.

Since we talked about how scale_factor can be used to effectively disable this
new feature, I thought that scale=100 was too small and suggesed 1e10 (same as
max for vacuum_cleanup_index_scale_factor since 4d54543ef).  That should allow
handling the case that analyze is disabled, or its threshold is high, or it
hasn't run yet, or it's running but hasn't finished, or analyze is triggered as
same time as vacuum.

A table with 1e7 tuples (threshold) into which one inserts 1e9 tuples would hit
scale_factor=100 threshold, which means scale_factor failed to "disable" the
feature, as claimed.  If anything, I think it may need to be larger...

-- 
Justin

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-25 Thread Alvaro Herrera

On 2020-Mar-25, Justin Pryzby wrote:

> Maybe in the docs you can write this with thousands separators: 10,000,000
> 
> It looks like the GUC uses scale factor max=1e10, but the relopt is still
> max=100, which means it's less possible to disable for a single rel.

I have paid no attention to this thread, but how does it make sense to
have a scale factor to be higher than 100?  Surely you mean the
threshold value that should be set to ten million, not the scale factor?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-25 Thread Justin Pryzby

On Mon, Mar 23, 2020 at 02:27:29PM +0100, Laurenz Albe wrote:
> Here is version 10 of the patch, which uses a scale factor of 0.2.

Thanks

> Any table that has received more inserts since it was
> last vacuumed (and that is not vacuumed for another
> reason) will be autovacuumed.

Since this vacuum doesn't trigger any special behavior (freeze), you can remove
the parenthesized part: "(and that is not vacuumed for another reason)".

Maybe in the docs you can write this with thousands separators: 10,000,000

It looks like the GUC uses scale factor max=1e10, but the relopt is still
max=100, which means it's less possible to disable for a single rel.

> +++ b/src/backend/access/common/reloptions.c
> @@ -398,6 +407,15 @@ static relopt_real realRelOpts[] =
>   },
>   -1, 0.0, 100.0
>   },
> + {
> + {
> + "autovacuum_vacuum_insert_scale_factor",
> + "Number of tuple inserts prior to vacuum as a fraction 
> of reltuples",
> + RELOPT_KIND_HEAP | RELOPT_KIND_TOAST,
> + ShareUpdateExclusiveLock
> + },
> + -1, 0.0, 100.0
> + },
>   {
>   {
>   "autovacuum_analyze_scale_factor",

> +++ b/src/backend/utils/misc/guc.c
> @@ -3549,6 +3558,17 @@ static struct config_real ConfigureNamesReal[] =
>   0.2, 0.0, 100.0,
>   NULL, NULL, NULL
>   },
> +
> + {
> + {"autovacuum_vacuum_insert_scale_factor", PGC_SIGHUP, 
> AUTOVACUUM,
> + gettext_noop("Number of tuple inserts prior to vacuum 
> as a fraction of reltuples."),
> + NULL
> + },
> + &autovacuum_vac_ins_scale,
> + 0.2, 0.0, 1e10,
> + NULL, NULL, NULL
> + },
> +
>   {
>   {"autovacuum_analyze_scale_factor", PGC_SIGHUP, AUTOVACUUM,
>   gettext_noop("Number of tuple inserts, updates, or 
> deletes prior to analyze as a fraction of reltuples."),

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-25 Thread Laurenz Albe

On Mon, 2020-03-23 at 14:27 +0100, Laurenz Albe wrote:
> Here is version 10 of the patch, which uses a scale factor of 0.2.

This patch should be what everybody can live with.

It would be good if we can get at least that committed before feature freeze.

Yours,
Laurenz Albe

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-23 Thread Laurenz Albe

On Fri, 2020-03-20 at 14:43 +0100, Laurenz Albe wrote:
> I.e. with the default settings we will perform a whole-index scan
> > (without visibility map or such) after every 10% growth of the
> > table. Which means that, even if the visibility map prevents repeated
> > tables accesses, increasing the rate of vacuuming for insert-only tables
> > can cause a lot more whole index scans.  Which means that vacuuming an
> > insert-only workload frequently *will* increase the total amount of IO,
> > even if there is not a single dead tuple. Rather than just spreading the
> > same amount of IO over more vacuums.
> > 
> > And both gin and gist just always do a full index scan, regardless of
> > vacuum_cleanup_index_scale_factor (either during a bulk delete, or
> > during the cleanup).  Thus more frequent vacuuming for insert-only
> > tables can cause a *lot* of pain (even an approx quadratic increase of
> > IO?  O(increased_frequency * peak_index_size)?) if you have large
> > indexes - which is very common for gin/gist.
> 
> In the light of that, I agree that we should increase the scale_factor.

Here is version 10 of the patch, which uses a scale factor of 0.2.

Yours,
Laurenz Albe
From 2160b241ccfb8cd1a684108db9cec55fb9d6f3c9 Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Mon, 23 Mar 2020 14:23:56 +0100
Subject: [PATCH] Autovacuum tables that have received only inserts

Add "autovacuum_vacuum_insert_threshold" and
"autovacuum_vacuum_insert_scale_factor" GUC and reloption.
The default value for the threshold is 1000;
the scale factor defaults to 0.2.

Any table that has received more inserts since it was
last vacuumed (and that is not vacuumed for another
reason) will be autovacuumed.

This avoids the known problem that insert-only tables
are never autovacuumed until they need to have their
anti-wraparound autovacuum, which then can be massive
and disruptive.

The feature can also be used to vacuum insert-only
tables often enough to get an index-only scan.
For that, one would use a lower threshold and scale factor.

To track the number of inserts since the last vacuum,
introduce a StatTabEntry "inserts_since_vacuum" that
gets reset to 0 after a vacuum.  This value is available
in "pg_stat_*_tables" as "n_ins_since_vacuum".

Author: Laurenz Albe, based on a suggestion from Darafei Praliaskouski
Reviewed-by: David Rowley, Justin Pryzby, Masahiko Sawada, Andres Freund
Discussion: https://postgr.es/m/CAC8Q8t+j36G_bLF=+0imo6jgnwnlnwb1tujxujr-+x8zcct...@mail.gmail.com
---
 doc/src/sgml/config.sgml  | 41 +++
 doc/src/sgml/maintenance.sgml | 23 ---
 doc/src/sgml/monitoring.sgml  |  5 +++
 doc/src/sgml/ref/create_table.sgml| 30 ++
 src/backend/access/common/reloptions.c| 22 ++
 src/backend/catalog/system_views.sql  |  1 +
 src/backend/postmaster/autovacuum.c   | 22 --
 src/backend/postmaster/pgstat.c   |  5 +++
 src/backend/utils/adt/pgstatfuncs.c   | 16 
 src/backend/utils/misc/guc.c  | 20 +
 src/backend/utils/misc/postgresql.conf.sample |  4 ++
 src/bin/psql/tab-complete.c   |  4 ++
 src/include/catalog/pg_proc.dat   |  5 +++
 src/include/pgstat.h  |  1 +
 src/include/postmaster/autovacuum.h   |  2 +
 src/include/utils/rel.h   |  2 +
 src/test/regress/expected/rules.out   |  3 ++
 17 files changed, 198 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 70854ae298..e50018c917 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7301,6 +7301,26 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
  
 
+ 
+  autovacuum_vacuum_insert_threshold (integer)
+  
+   autovacuum_vacuum_insert_threshold
+   configuration parameter
+  
+  
+  
+   
+Specifies the number of inserted tuples needed to trigger a
+VACUUM in any one table.
+The default is 1000 tuples.
+This parameter can only be set in the postgresql.conf
+file or on the server command line;
+but the setting can be overridden for individual tables by
+changing table storage parameters.
+   
+  
+ 
+
  
   autovacuum_analyze_threshold (integer)
   
@@ -7342,6 +7362,27 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
  
 
+ 
+  autovacuum_vacuum_insert_scale_factor (floating point)
+  
+   autovacuum_vacuum_insert_scale_factor
+   configuration parameter
+  
+  
+  
+   
+Specifies a fraction of the table size to add to
+autovacuum_vacuum_insert_threshold
+when deciding whether to trigger a VACUUM.
+The default is 0.2 (20% of table size).
+This parameter can only be set in the postgr

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-20 Thread Laurenz Albe

On Thu, 2020-03-19 at 23:20 -0700, Andres Freund wrote:
> I am not sure about b).  In my mind, the objective is not to prevent
> > anti-wraparound vacuums, but to see that they have less work to do,
> > because previous autovacuum runs already have frozen anything older than
> > vacuum_freeze_min_age.  So, assuming linear growth, the number of tuples
> > to freeze during any run would be at most one fourth of today's number
> > when we hit autovacuum_freeze_max_age.
> 
> Based on two IM conversations I think it might be worth emphasizing how
> vacuum_cleanup_index_scale_factor works:
> 
> For btree, even if there is not a single deleted tuple, we can *still*
> end up doing a full index scans at the end of vacuum. As the docs describe
> vacuum_cleanup_index_scale_factor:
> 
>
> Specifies the fraction of the total number of heap tuples counted in
> the previous statistics collection that can be inserted without
> incurring an index scan at the VACUUM cleanup 
> stage.
> This setting currently applies to B-tree indexes only.
>
> 
> I.e. with the default settings we will perform a whole-index scan
> (without visibility map or such) after every 10% growth of the
> table. Which means that, even if the visibility map prevents repeated
> tables accesses, increasing the rate of vacuuming for insert-only tables
> can cause a lot more whole index scans.  Which means that vacuuming an
> insert-only workload frequently *will* increase the total amount of IO,
> even if there is not a single dead tuple. Rather than just spreading the
> same amount of IO over more vacuums.
> 
> And both gin and gist just always do a full index scan, regardless of
> vacuum_cleanup_index_scale_factor (either during a bulk delete, or
> during the cleanup).  Thus more frequent vacuuming for insert-only
> tables can cause a *lot* of pain (even an approx quadratic increase of
> IO?  O(increased_frequency * peak_index_size)?) if you have large
> indexes - which is very common for gin/gist.

Ok, ok.  Thanks for the explanation.

In the light of that, I agree that we should increase the scale_factor.

Yours,
Laurenz Albe

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-20 Thread Masahiko Sawada

On Fri, 20 Mar 2020 at 15:20, Andres Freund  wrote:
>
> Hi,
>
> On 2020-03-19 06:45:48 +0100, Laurenz Albe wrote:
> > On Tue, 2020-03-17 at 18:02 -0700, Andres Freund wrote:
> > > I don't think a default scale factor of 0 is going to be ok. For
> > > large-ish tables this will basically cause permanent vacuums. And it'll
> > > sometimes trigger for tables that actually coped well so far. 10 million
> > > rows could be a few seconds, not more.
> > >
> > > I don't think that the argument that otherwise a table might not get
> > > vacuumed before autovacuum_freeze_max_age is convincing enough.
> > >
> > > a) if that's indeed the argument, we should increase the default
> > >   autovacuum_freeze_max_age - now that there's insert triggered vacuums,
> > >   the main argument against that from before isn't valid anymore.
> > >
> > > b) there's not really a good arguments for vacuuming more often than
> > >   autovacuum_freeze_max_age for such tables. It'll not be not frequent
> > >   enough to allow IOS for new data, and you're not preventing
> > >   anti-wraparound vacuums from happening.
> >
> > According to my reckoning, that is the remaining objection to the patch
> > as it is (with ordinary freezing behavior).
> >
> > How about a scale_factor od 0.005?  That will be high enough for large
> > tables, which seem to be the main concern here.
> >
> > I fully agree with your point a) - should that be part of the patch?
> >
> > I am not sure about b).  In my mind, the objective is not to prevent
> > anti-wraparound vacuums, but to see that they have less work to do,
> > because previous autovacuum runs already have frozen anything older than
> > vacuum_freeze_min_age.  So, assuming linear growth, the number of tuples
> > to freeze during any run would be at most one fourth of today's number
> > when we hit autovacuum_freeze_max_age.
>
> Based on two IM conversations I think it might be worth emphasizing how
> vacuum_cleanup_index_scale_factor works:
>
> For btree, even if there is not a single deleted tuple, we can *still*
> end up doing a full index scans at the end of vacuum. As the docs describe
> vacuum_cleanup_index_scale_factor:
>
>
> Specifies the fraction of the total number of heap tuples counted in
> the previous statistics collection that can be inserted without
> incurring an index scan at the VACUUM cleanup 
> stage.
> This setting currently applies to B-tree indexes only.
>
>
> I.e. with the default settings we will perform a whole-index scan
> (without visibility map or such) after every 10% growth of the
> table. Which means that, even if the visibility map prevents repeated
> tables accesses, increasing the rate of vacuuming for insert-only tables
> can cause a lot more whole index scans.  Which means that vacuuming an
> insert-only workload frequently *will* increase the total amount of IO,
> even if there is not a single dead tuple. Rather than just spreading the
> same amount of IO over more vacuums.

Right.

>
> And both gin and gist just always do a full index scan, regardless of
> vacuum_cleanup_index_scale_factor (either during a bulk delete, or
> during the cleanup).  Thus more frequent vacuuming for insert-only
> tables can cause a *lot* of pain (even an approx quadratic increase of
> IO?  O(increased_frequency * peak_index_size)?) if you have large
> indexes - which is very common for gin/gist.

That's right but for gin, more frequent vacuuming for insert-only
tables can help to clean up the pending list, which increases search
speed and better than doing it by a backend process.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-19 Thread Andres Freund

Hi,

On 2020-03-20 06:59:57 +0100, Laurenz Albe wrote:
> On Thu, 2020-03-19 at 15:17 -0700, Andres Freund wrote:
> > I am *VERY* doubtful that the attempt of using a large threshold, and a
> > tiny scale factor, is going to work out well. I'm not confident enough
> > in my gut feeling to full throatedly object, but confident enough that
> > I'd immediately change it on any important database I operated.
> > 
> > Independent of how large a constant you set the threshold to, for
> > databases with substantially bigger tables this will lead to [near]
> > constant vacuuming. As soon as you hit 1 billion rows - which isn't
> > actually that much - this is equivalent to setting
> > autovacuum_{vacuum,analyze}_scale_factor to 0.01. There's cases where
> > that can be a sensible setting, but I don't think anybody would suggest
> > it as a default.
> 
> In that, you are assuming that the bigger a table is, the more data
> modifications it will get, so that making the scale factor the dominant
> element will work out better.

> My experience is that it is more likely for the change rate (inserts,
> I am less certain about updates and deletes) to be independent of the
> table size.  (Too) many large databases are so large not because the
> data influx grows linearly over time, but because people don't want to
> get rid of old data (or would very much like to do so, but never planned
> for it).

I don't think growing ingest rate into insert only tables is exactly
rare. Maybe I've been too long in the Bay Area though.

> This second scenario would be much better served by a high threshold and
> a low scale factor.

I don't think that's really true. As soon as there's any gin/gist
indexes, a single non-HOT dead tuple, or a btree index grew by more
than vacuum_cleanup_index_scale_factor, indexes are scanned as a
whole. See the email I just concurrently happened to write:
https://postgr.es/m/20200320062031.uwagypenawujwajx%40alap3.anarazel.de

Which means that often each additional vacuum causes IO that's
proportional to the *total* index size, *not* the table size
delta. Which means that the difference in total IO basically is
O(increased_frequency * peak_table_size) in the worst case.

> > After thinking about it for a while, I think it's fundamentally flawed
> > to use large constant thresholds to avoid unnecessary vacuums. It's easy
> > to see cases where it's bad for common databases of today, but it'll be
> > much worse a few years down the line where common table sizes have grown
> > by a magnitude or two. Nor do they address the difference between tables
> > of a certain size with e.g. 2kb wide rows, and a same sized table with
> > 28 byte wide rows.  The point of constant thresholds imo can only be to
> > avoid unnecessary work at the *small* (even tiny) end, not the opposite.
> > 
> > 
> > I think there's too much "reinventing" autovacuum scheduling in a
> > "local" insert-only manner happening in this thread. And as far as I can
> > tell additionally only looking at a somewhat narrow slice of insert only
> > workloads.
> 
> Perhaps.  The traditional "high scale factor, low threshold" system
> is (in my perception) mostly based on the objective of cleaning up
> dead tuples.  When autovacuum was introduced, index only scans were
> only a dream.
> 
> With the objective of getting rid of dead tuples, having the scale factor
> be the dominant part makes sense: it is OK for bloat to be a certain
> percentage of the table size.
> 

As far as I can tell this argument doesn't make sense in light of the ob
fact that many vacuums trigger whole index scans, even if there are no
deleted tuples, as described above?

Even disregarding the index issue, I still don't think your argument is
very convicing.  For one, as I mentioned in another recent email, 10
million rows in a narrow table is something entirely different than 10
million rows in a very wide table. scale_factor doesn't have that
problem to the same degree.  Also, it's fairly obvious that this
argument doesn't hold in the general sense, otherwise we could just set
a threshold of, say, 1.

There's also the issue that frequent vacuums will often not be able to
mark most of the the new data all-visible, due to concurrent
sessions. E.g. concurrent bulk loading sessions, analytics queries
actually looking at the data, replicas all can easily prevent data that
was just inserted from being marked 'all-visible' (not to speak of
frozen). That's not likely to be a problem in a purely oltp system that
inserts only single rows per xact, and has no longlived readers (nor
replicas with hs_feedback = on), but outside of that...

> Also, as you say, tables were much smaller then, and they will only
> become bigger in the future.  But I find that to be an argument *for*
> making the threshold the dominant element: otherwise, you vacuum less
> and less often, and the individual runs become larger and larger.

Which mostly is ok, because there are significant costs that scal

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-19 Thread Andres Freund

Hi,

On 2020-03-19 06:45:48 +0100, Laurenz Albe wrote:
> On Tue, 2020-03-17 at 18:02 -0700, Andres Freund wrote:
> > I don't think a default scale factor of 0 is going to be ok. For
> > large-ish tables this will basically cause permanent vacuums. And it'll
> > sometimes trigger for tables that actually coped well so far. 10 million
> > rows could be a few seconds, not more.
> > 
> > I don't think that the argument that otherwise a table might not get
> > vacuumed before autovacuum_freeze_max_age is convincing enough.
> > 
> > a) if that's indeed the argument, we should increase the default
> >   autovacuum_freeze_max_age - now that there's insert triggered vacuums,
> >   the main argument against that from before isn't valid anymore.
> > 
> > b) there's not really a good arguments for vacuuming more often than
> >   autovacuum_freeze_max_age for such tables. It'll not be not frequent
> >   enough to allow IOS for new data, and you're not preventing
> >   anti-wraparound vacuums from happening.
> 
> According to my reckoning, that is the remaining objection to the patch
> as it is (with ordinary freezing behavior).
> 
> How about a scale_factor od 0.005?  That will be high enough for large
> tables, which seem to be the main concern here.
> 
> I fully agree with your point a) - should that be part of the patch?
> 
> I am not sure about b).  In my mind, the objective is not to prevent
> anti-wraparound vacuums, but to see that they have less work to do,
> because previous autovacuum runs already have frozen anything older than
> vacuum_freeze_min_age.  So, assuming linear growth, the number of tuples
> to freeze during any run would be at most one fourth of today's number
> when we hit autovacuum_freeze_max_age.

Based on two IM conversations I think it might be worth emphasizing how
vacuum_cleanup_index_scale_factor works:

For btree, even if there is not a single deleted tuple, we can *still*
end up doing a full index scans at the end of vacuum. As the docs describe
vacuum_cleanup_index_scale_factor:

Specifies the fraction of the total number of heap tuples counted in
the previous statistics collection that can be inserted without
incurring an index scan at the VACUUM cleanup stage.
This setting currently applies to B-tree indexes only.

I.e. with the default settings we will perform a whole-index scan
(without visibility map or such) after every 10% growth of the
table. Which means that, even if the visibility map prevents repeated
tables accesses, increasing the rate of vacuuming for insert-only tables
can cause a lot more whole index scans.  Which means that vacuuming an
insert-only workload frequently *will* increase the total amount of IO,
even if there is not a single dead tuple. Rather than just spreading the
same amount of IO over more vacuums.

And both gin and gist just always do a full index scan, regardless of
vacuum_cleanup_index_scale_factor (either during a bulk delete, or
during the cleanup).  Thus more frequent vacuuming for insert-only
tables can cause a *lot* of pain (even an approx quadratic increase of
IO?  O(increased_frequency * peak_index_size)?) if you have large
indexes - which is very common for gin/gist.

Is there something missing in the above description?

Greetings,

Andres Freund

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-19 Thread Laurenz Albe

On Thu, 2020-03-19 at 14:38 -0700, Andres Freund wrote:
> > I am not sure about b).  In my mind, the objective is not to prevent
> > anti-wraparound vacuums, but to see that they have less work to do,
> > because previous autovacuum runs already have frozen anything older than
> > vacuum_freeze_min_age.  So, assuming linear growth, the number of tuples
> > to freeze during any run would be at most one fourth of today's number
> > when we hit autovacuum_freeze_max_age.
> 
> This whole chain of arguments seems like it actually has little to do
> with vacuuming insert only/mostly tables. The same problem exists for
> tables that aren't insert only/mostly. Instead it IMO is an argument for
> a general change in logic about when to freeze.

My goal was to keep individual vacuum runs from having too much
work to do.  The freezing was an afterthought.

The difference (for me) is that I am more convinced that the insert
rate for insert-only table is constant over time than I am of the
update rate to be constant.

> What exactly is it that you want to achieve by having anti-wrap vacuums
> be quicker? If the goal is to reduce the window in which autovacuums
> aren't automatically cancelled when there's a conflicting lock request,
> or in which autovacuum just schedules based on xid age, then you can't
> have wraparound vacuums needing to do substantial amount of work.
> 
> Except for not auto-cancelling, and the autovac scheduling issue,
> there's really nothing magic about anti-wrap vacuums.

Yes.  I am under the impression that it is the duration and amount
of work per vacuum run that is the problem here, not the aggressiveness
as such.

If you are in the habit of frequently locking tables with high
lock modes (and I have seen people do that), you are lost anyway:
normal autovacuum runs will always die, and anti-wraparound vacuum
will kill you.  There is nothing we can do about that, except perhaps
put a fat warning in the documentation of LOCK.

> If the goal is to avoid redundant writes, then it's largely unrelated to
> anti-wrap vacuums, and can to a large degree addressed by
> opportunistically freezing (best even during hot pruning!).
> 
> 
> I am more and more convinced that it's a seriously bad idea to tie
> committing "autovacuum after inserts" to also committing a change in
> logic around freezing. That's not to say we shouldn't try to address
> both this cycle, but discussing them as if they really are one item
> makes it both more likely that we get nothing in, and more likely that
> we miss the larger picture.

I hear you, and I agree that we shouldn't do it with this patch.

> If there are no other modifications to the page, more aggressively
> freezing can lead to seriously increased write volume. Its quite normal
> to have databases where data in insert only tables *never* gets old
> enough to need to be frozen (either because xid usage is low, or because
> older partitions are dropped).  If data in an insert-only table isn't
> write-only, the hint bits are likely to already be set, which means that
> vacuum will just cause the entire table to be written another time,
> without a reason.
> 
> 
> I don't see how it's ok to substantially regress this very common
> workload. IMO this basically means that more aggressively and
> non-opportunistically freezing simply is a no-go (be it for insert or
> other causes for vacuuming).
> 
> What am I missing?

Nothing that I can see, and these are good examples why eager freezing
may not be such a smart idea after all.

I think your idea of freezing everything on a page when we know it is
going to be dirtied anyway is the smartest way of going about that.

My only remaining quibbles are about scale factor and threshold, see
my other mail.

Yours,
Laurenz Albe

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-19 Thread Laurenz Albe

On Thu, 2020-03-19 at 15:17 -0700, Andres Freund wrote:
> I am doubtful it should be committed with the current settings. See below.
> 
> > From 3ba4b572d82969bbb2af787d1bccc72f417ad3a0 Mon Sep 17 00:00:00 2001
> > From: Laurenz Albe 
> > Date: Thu, 19 Mar 2020 20:26:43 +0100
> > Subject: [PATCH] Autovacuum tables that have received only inserts
> > 
> > This avoids the known problem that insert-only tables
> > are never autovacuumed until they need to have their
> > anti-wraparound autovacuum, which then can be massive
> > and disruptive.
> 
> Shouldn't this also mention index only scans? IMO that's at least as big
> a problem as the "large vacuum" problem.

Yes, that would be good.

> I am *VERY* doubtful that the attempt of using a large threshold, and a
> tiny scale factor, is going to work out well. I'm not confident enough
> in my gut feeling to full throatedly object, but confident enough that
> I'd immediately change it on any important database I operated.
> 
> Independent of how large a constant you set the threshold to, for
> databases with substantially bigger tables this will lead to [near]
> constant vacuuming. As soon as you hit 1 billion rows - which isn't
> actually that much - this is equivalent to setting
> autovacuum_{vacuum,analyze}_scale_factor to 0.01. There's cases where
> that can be a sensible setting, but I don't think anybody would suggest
> it as a default.

In that, you are assuming that the bigger a table is, the more data
modifications it will get, so that making the scale factor the dominant
element will work out better.

My experience is that it is more likely for the change rate (inserts,
I am less certain about updates and deletes) to be independent of the
table size.  (Too) many large databases are so large not because the
data influx grows linearly over time, but because people don't want to
get rid of old data (or would very much like to do so, but never planned
for it).

This second scenario would be much better served by a high threshold and
a low scale factor.

> After thinking about it for a while, I think it's fundamentally flawed
> to use large constant thresholds to avoid unnecessary vacuums. It's easy
> to see cases where it's bad for common databases of today, but it'll be
> much worse a few years down the line where common table sizes have grown
> by a magnitude or two. Nor do they address the difference between tables
> of a certain size with e.g. 2kb wide rows, and a same sized table with
> 28 byte wide rows.  The point of constant thresholds imo can only be to
> avoid unnecessary work at the *small* (even tiny) end, not the opposite.
> 
> 
> I think there's too much "reinventing" autovacuum scheduling in a
> "local" insert-only manner happening in this thread. And as far as I can
> tell additionally only looking at a somewhat narrow slice of insert only
> workloads.

Perhaps.  The traditional "high scale factor, low threshold" system
is (in my perception) mostly based on the objective of cleaning up
dead tuples.  When autovacuum was introduced, index only scans were
only a dream.

With the objective of getting rid of dead tuples, having the scale factor
be the dominant part makes sense: it is OK for bloat to be a certain
percentage of the table size.

Also, as you say, tables were much smaller then, and they will only
become bigger in the future.  But I find that to be an argument *for*
making the threshold the dominant element: otherwise, you vacuum less
and less often, and the individual runs become larger and larger.
Now that vacuum skips pages where it knows it has nothing to do,
doesn't take away much of the pain of vacuuming large tables where
nothing much has changed?

> I, again, strongly suggest using much more conservative values here. And
> then try to address the shortcomings - like not freezing aggressively
> enough - in separate patches (and by now separate releases, in all
> likelihood).

There is much to say for that, I agree.

> This will have a huge impact on a lot of postgres
> installations. Autovacuum already is perceived as one of the biggest
> issues around postgres. If the ratio of cases where these changes
> improve things to the cases it regresses isn't huge, it'll be painful
> (silent improvements are obviously less noticed than breakages).

Yes, that makes it scary to mess with autovacuum.

One of the problems I see in the course of this discussion is that one
can always come up with examples that make any choice look bad.
It is impossible to do it right for everybody.

In the light of that, I won't object to a more conservative default
value for the parameters, even though my considerations above suggest
to me the opposite.  But perhaps my conclusions are based on flawed
premises.

Yours,
Laurenz Albe

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-19 Thread Andres Freund

Hi,

On 2020-03-20 15:05:03 +1300, David Rowley wrote:
> On Fri, 20 Mar 2020 at 11:17, Andres Freund  wrote:
> > I think there's too much "reinventing" autovacuum scheduling in a
> > "local" insert-only manner happening in this thread. And as far as I can
> > tell additionally only looking at a somewhat narrow slice of insert only
> > workloads.
> 
> I understand your concern and you might be right. However, I think the
> main reason that the default settings for the new threshold and scale
> factor has deviated this far from the existing settings is regarding
> the example of a large insert-only table that receives inserts of 1
> row per xact.  If we were to copy the existing settings then when that
> table gets to 1 billion rows, it would be eligible for an
> insert-vacuum after 200 million tuples/xacts, which does not help the
> situation since an anti-wraparound vacuum would be triggering then
> anyway.

Sure, that'd happen for inserts that happen after that threshold. I'm
just not convinced that this is as huge a problem as presented in this
thread. And I'm fairly convinced the proposed solution is the wrong
direction to go into.

It's not like that's not an issue for updates? If you update one row per
transaction, then you run into exactly the same issue for a table of the
same size?  You maybe could argue that it's more common to insert 1
billion tuples in individual transaction, than it is to update 1 billion
tuples in individual transactions, but I don't think it's a huge
difference if it even exist.

In fact the problem is worse for the update case, because that tends to
generate a lot more random looking IO during vacuum (both because only
parts of the table are updated causing small block reads/writes, and
because it will need [multiple] index scans/vacuum, and because the
vacuum is a lot more expensive CPU time wise).

Imo this line of reasoning is about adding autovacuum scheduling based
on xid age, not about insert only workloads.

Greetings,

Andres Freund

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-19 Thread David Rowley

On Fri, 20 Mar 2020 at 11:17, Andres Freund  wrote:
> I think there's too much "reinventing" autovacuum scheduling in a
> "local" insert-only manner happening in this thread. And as far as I can
> tell additionally only looking at a somewhat narrow slice of insert only
> workloads.

I understand your concern and you might be right. However, I think the
main reason that the default settings for the new threshold and scale
factor has deviated this far from the existing settings is regarding
the example of a large insert-only table that receives inserts of 1
row per xact.  If we were to copy the existing settings then when that
table gets to 1 billion rows, it would be eligible for an
insert-vacuum after 200 million tuples/xacts, which does not help the
situation since an anti-wraparound vacuum would be triggering then
anyway.

I'm unsure if it will help with the discussion, but I put together a
quick and dirty C program to show when a table will be eligible for an
auto-vacuum with the given scale_factor and threshold

$ gcc -O2 vacuum.c -o vacuum
$ ./vacuum
Syntax ./vacuum   
$ ./vacuum 0.01 1000 1000 | tail -n 1
Vacuum 463 at 99183465731 reltuples, 991915456 inserts
$ ./vacuum 0.2 50 1000 | tail -n 1
Vacuum 108 at 90395206733 reltuples, 15065868288 inserts

So, yeah, certainly, there are more than four times as many vacuums
with an insert-only table of 100 billion rows using the proposed
settings vs the defaults for the existing scale_factor and threshold.
However, at the tail end of the first run there, we were close to a
billion rows (991,915,456) between vacucums. Is that too excessive?

I'm sharing this in the hope that it'll make it easy to experiment
with settings which we can all agree on.

For a 1 billion row table, the proposed settings give us 69 vacuums
and the standard settings 83.
#include 
#include 

unsigned long long
atoull(char *a)
{
unsigned long long val = 0;
/* crude, no wraparound checks */
while (*a >= '0' && *a <= '9')
{
val = val * 10 + *a - '0';
a++;
}
return val;
}

int
main(int argc, char **argv)
{
unsigned long long reltuples;
float scale_factor;
unsigned long long threshold;
unsigned long long n_vacuums;
unsigned long long n_inserts;
unsigned long long max_tuples;

if (argc < 4)
{
fprintf(stderr, "Syntax %s   \n", argv[0]);
return -1;
}

scale_factor = atof(argv[1]);
threshold = atoull(argv[2]);
max_tuples = atoull(argv[3]);
reltuples = 1;
n_vacuums = 0;

printf("scale_factor = %g, threshold = %llu\n", scale_factor, threshold);

for(;;)
{
n_inserts = threshold + reltuples * scale_factor + 1;

/* do "vacuum" */
n_vacuums++;
reltuples += n_inserts;

if (reltuples > max_tuples)
break;
printf("Vacuum %llu at %llu reltuples, %llu inserts\n", n_vacuums, 
reltuples, n_inserts);
}

return 0;
}

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-19 Thread Andres Freund

Hi,

On 2020-03-20 01:11:23 +0300, Darafei "Komяpa" Praliaskouski wrote:
> > > According to my reckoning, that is the remaining objection to the patch
> > > as it is (with ordinary freezing behavior).
> > >
> > > How about a scale_factor od 0.005?  That will be high enough for large
> > > tables, which seem to be the main concern here.
> >
> > Seems low on a first blush. On a large-ish table with 1 billion tuples,
> > we'd vacuum every 5 million inserts. For many ETL workloads this will
> > result in a vacuum after every bulk operation. Potentially with an index
> > scan associated (even if there's no errors, a lot of bulk loads use ON
> > CONFLICT INSERT leading to the occasional update).
> 
> This is a good and wanted thing.

I don't think that's true in general. As proposed this can increase the
overall amount of IO (both reads and writes) due to vacuum by a *LOT*.

> Upthread it was already suggested that "everyone knows to vacuum after
> bulk operations". This will go and vacuum the data while it's hot and
> in caches, not afterwards, reading from disk.

For many bulk load cases the data will not be in cache, in particular not
when individual bulk inserts are more than a few gigabytes.

> The problem hit by Mandrill is simple: in modern cloud environments
> it's sometimes simply impossible to read all the data on disk because
> of different kinds of throttling.

Yes. Which is one of the reasons why this has the potential to cause
serious issues. The proposed changes very often will *increase* the
total amount of IO. A good fraction of the time that will be "hidden" by
caching, but it'll be far from all the time.

> At some point your production database just shuts down and asks to
> VACUUM in single user mode for 40 days.

That basically has nothing to do with what we're talking about here. The
default wraparound trigger is 200 million xids, and shutdowns start at
more than 2 billion xids. If an anti-wrap autovacuum can't finish within
2 billion rows, then this won't be addressed by vacuuming more
frequently (including more frequent index scans, causing a lot more
IO!).

Greetings,

Andres Freund

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-19 Thread Andres Freund

Hi,

On 2020-03-19 20:47:40 +0100, Laurenz Albe wrote:
> On Thu, 2020-03-19 at 21:39 +1300, David Rowley wrote:
> > I've attached a small fix which I'd like to apply to your v8 patch.
> > With that, and pending one final look, I'd like to push this during my
> > Monday (New Zealand time).  So if anyone strongly objects to that,
> > please state their case before then.

I am doubtful it should be committed with the current settings. See below.

> From 3ba4b572d82969bbb2af787d1bccc72f417ad3a0 Mon Sep 17 00:00:00 2001
> From: Laurenz Albe 
> Date: Thu, 19 Mar 2020 20:26:43 +0100
> Subject: [PATCH] Autovacuum tables that have received only inserts
>
> Add "autovacuum_vacuum_insert_threshold" and
> "autovacuum_vacuum_insert_scale_factor" GUC and reloption.
> The default value for the threshold is 1000;
> the scale factor defaults to 0.01.
>
> Any table that has received more inserts since it was
> last vacuumed (and that is not vacuumed for another
> reason) will be autovacuumed.
>
> This avoids the known problem that insert-only tables
> are never autovacuumed until they need to have their
> anti-wraparound autovacuum, which then can be massive
> and disruptive.

Shouldn't this also mention index only scans? IMO that's at least as big
a problem as the "large vacuum" problem.

> +  xreflabel="autovacuum_vacuum_insert_threshold">
> +  autovacuum_vacuum_insert_threshold 
> (integer)
> +  
> +   
> autovacuum_vacuum_insert_threshold
> +   configuration parameter
> +  
> +  
> +  
> +   
> +Specifies the number of inserted tuples needed to trigger a
> +VACUUM in any one table.
> +The default is 1000 tuples.
> +This parameter can only be set in the 
> postgresql.conf
> +file or on the server command line;
> +but the setting can be overridden for individual tables by
> +changing table storage parameters.
> +   
> +  
> + 
> +
>xreflabel="autovacuum_analyze_threshold">
>autovacuum_analyze_threshold 
> (integer)
>
> @@ -7342,6 +7362,27 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' 
> WITH csv;
>
>   
>
> +  xreflabel="autovacuum_vacuum_insert_scale_factor">
> +  autovacuum_vacuum_insert_scale_factor 
> (floating point)
> +  
> +   
> autovacuum_vacuum_insert_scale_factor
> +   configuration parameter
> +  
> +  
> +  
> +   
> +Specifies a fraction of the table size to add to
> +autovacuum_vacuum_insert_threshold
> +when deciding whether to trigger a VACUUM.
> +The default is 0.01 (1% of table size).
> +This parameter can only be set in the 
> postgresql.conf
> +file or on the server command line;
> +but the setting can be overridden for individual tables by
> +changing table storage parameters.
> +   
> +  
> + 
> +

I am *VERY* doubtful that the attempt of using a large threshold, and a
tiny scale factor, is going to work out well. I'm not confident enough
in my gut feeling to full throatedly object, but confident enough that
I'd immediately change it on any important database I operated.

Independent of how large a constant you set the threshold to, for
databases with substantially bigger tables this will lead to [near]
constant vacuuming. As soon as you hit 1 billion rows - which isn't
actually that much - this is equivalent to setting
autovacuum_{vacuum,analyze}_scale_factor to 0.01. There's cases where
that can be a sensible setting, but I don't think anybody would suggest
it as a default.

After thinking about it for a while, I think it's fundamentally flawed
to use large constant thresholds to avoid unnecessary vacuums. It's easy
to see cases where it's bad for common databases of today, but it'll be
much worse a few years down the line where common table sizes have grown
by a magnitude or two. Nor do they address the difference between tables
of a certain size with e.g. 2kb wide rows, and a same sized table with
28 byte wide rows.  The point of constant thresholds imo can only be to
avoid unnecessary work at the *small* (even tiny) end, not the opposite.

I think there's too much "reinventing" autovacuum scheduling in a
"local" insert-only manner happening in this thread. And as far as I can
tell additionally only looking at a somewhat narrow slice of insert only
workloads.

I, again, strongly suggest using much more conservative values here. And
then try to address the shortcomings - like not freezing aggressively
enough - in separate patches (and by now separate releases, in all
likelihood).

This will have a huge impact on a lot of postgres
installations. Autovacuum already is perceived as one of the biggest
issues around postgres. If the ratio of cases where these changes
improve things to the cases it regresses isn't huge, it'll be painful
(silent improvements are obviously less noticed than breakages).

Greetings,

Andres Freund

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-19 Thread Komяpa

> > According to my reckoning, that is the remaining objection to the patch
> > as it is (with ordinary freezing behavior).
> >
> > How about a scale_factor od 0.005?  That will be high enough for large
> > tables, which seem to be the main concern here.
>
> Seems low on a first blush. On a large-ish table with 1 billion tuples,
> we'd vacuum every 5 million inserts. For many ETL workloads this will
> result in a vacuum after every bulk operation. Potentially with an index
> scan associated (even if there's no errors, a lot of bulk loads use ON
> CONFLICT INSERT leading to the occasional update).

This is a good and wanted thing. Upthread it was already suggested
that "everyone knows to vacuum after bulk operations". This will go and vacuum
the data while it's hot and in caches, not afterwards, reading from disk.


> > I am not sure about b).  In my mind, the objective is not to prevent
> > anti-wraparound vacuums, but to see that they have less work to do,
> > because previous autovacuum runs already have frozen anything older than
> > vacuum_freeze_min_age.  So, assuming linear growth, the number of tuples
> > to freeze during any run would be at most one fourth of today's number
> > when we hit autovacuum_freeze_max_age.
>
> This whole chain of arguments seems like it actually has little to do
> with vacuuming insert only/mostly tables. The same problem exists for
> tables that aren't insert only/mostly. Instead it IMO is an argument for
> a general change in logic about when to freeze.
>
> What exactly is it that you want to achieve by having anti-wrap vacuums
> be quicker? If the goal is to reduce the window in which autovacuums
> aren't automatically cancelled when there's a conflicting lock request,
> or in which autovacuum just schedules based on xid age, then you can't
> have wraparound vacuums needing to do substantial amount of work.

The problem hit by Mandrill is simple: in modern cloud environments
it's sometimes simply impossible to read all the data on disk because
of different kinds of throttling.
At some point your production database just shuts down and asks to
VACUUM in single user mode for 40 days.

You want vacuum to happen long before that, preferably when the data
is still in RAM, or, at least, fits your cloud provider's disk burst
performance budget, where performance of block device resembles that
of an SSD and not of a Floppy Disk.

Some more reading on how that works:
https://aws.amazon.com/ru/blogs/database/understanding-burst-vs-baseline-performance-with-amazon-rds-and-gp2/

-- 
Darafei Praliaskouski
Support me: http://patreon.com/komzpa

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-19 Thread Andres Freund

Hi,

On 2020-03-19 06:45:48 +0100, Laurenz Albe wrote:
> On Tue, 2020-03-17 at 18:02 -0700, Andres Freund wrote:
> > I don't think a default scale factor of 0 is going to be ok. For
> > large-ish tables this will basically cause permanent vacuums. And it'll
> > sometimes trigger for tables that actually coped well so far. 10 million
> > rows could be a few seconds, not more.
> > 
> > I don't think that the argument that otherwise a table might not get
> > vacuumed before autovacuum_freeze_max_age is convincing enough.
> > 
> > a) if that's indeed the argument, we should increase the default
> >   autovacuum_freeze_max_age - now that there's insert triggered vacuums,
> >   the main argument against that from before isn't valid anymore.
> > 
> > b) there's not really a good arguments for vacuuming more often than
> >   autovacuum_freeze_max_age for such tables. It'll not be not frequent
> >   enough to allow IOS for new data, and you're not preventing
> >   anti-wraparound vacuums from happening.
> 
> According to my reckoning, that is the remaining objection to the patch
> as it is (with ordinary freezing behavior).
> 
> How about a scale_factor od 0.005?  That will be high enough for large
> tables, which seem to be the main concern here.

Seems low on a first blush. On a large-ish table with 1 billion tuples,
we'd vacuum every 5 million inserts. For many ETL workloads this will
result in a vacuum after every bulk operation. Potentially with an index
scan associated (even if there's no errors, a lot of bulk loads use ON
CONFLICT INSERT leading to the occasional update).

Personally I think we should be considerably more conservative in the
first release or two. Exposing a lot of people that previously didn't
have a lot of problems to vacuuming being *massively* more aggressive,
basically permanently running on an insert only table, will be bad.

> I fully agree with your point a) - should that be part of the patch?

We can just make it a seperate patch committed shortly afterwards.

> I am not sure about b).  In my mind, the objective is not to prevent
> anti-wraparound vacuums, but to see that they have less work to do,
> because previous autovacuum runs already have frozen anything older than
> vacuum_freeze_min_age.  So, assuming linear growth, the number of tuples
> to freeze during any run would be at most one fourth of today's number
> when we hit autovacuum_freeze_max_age.

This whole chain of arguments seems like it actually has little to do
with vacuuming insert only/mostly tables. The same problem exists for
tables that aren't insert only/mostly. Instead it IMO is an argument for
a general change in logic about when to freeze.

What exactly is it that you want to achieve by having anti-wrap vacuums
be quicker? If the goal is to reduce the window in which autovacuums
aren't automatically cancelled when there's a conflicting lock request,
or in which autovacuum just schedules based on xid age, then you can't
have wraparound vacuums needing to do substantial amount of work.

Except for not auto-cancelling, and the autovac scheduling issue,
there's really nothing magic about anti-wrap vacuums.

If the goal is to avoid redundant writes, then it's largely unrelated to
anti-wrap vacuums, and can to a large degree addressed by
opportunistically freezing (best even during hot pruning!).

I am more and more convinced that it's a seriously bad idea to tie
committing "autovacuum after inserts" to also committing a change in
logic around freezing. That's not to say we shouldn't try to address
both this cycle, but discussing them as if they really are one item
makes it both more likely that we get nothing in, and more likely that
we miss the larger picture.

> I am still sorry to see more proactive freezing go, which would
> reduce the impact for truly insert-only tables.
> After sleeping on it, here is one last idea.
> 
> Granted, freezing with vacuum_freeze_min_age = 0 poses a problem
> for those parts of the table that will receive updates or deletes.

IMO it's not at all just those regions that are potentially negatively
affected:
If there are no other modifications to the page, more aggressively
freezing can lead to seriously increased write volume. Its quite normal
to have databases where data in insert only tables *never* gets old
enough to need to be frozen (either because xid usage is low, or because
older partitions are dropped).  If data in an insert-only table isn't
write-only, the hint bits are likely to already be set, which means that
vacuum will just cause the entire table to be written another time,
without a reason.

I don't see how it's ok to substantially regress this very common
workload. IMO this basically means that more aggressively and
non-opportunistically freezing simply is a no-go (be it for insert or
other causes for vacuuming).

What am I missing?

Greetings,

Andres Freund

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-19 Thread Laurenz Albe

On Thu, 2020-03-19 at 21:39 +1300, David Rowley wrote:
> > According to my reckoning, that is the remaining objection to the patch
> > as it is (with ordinary freezing behavior).
> >
> > How about a scale_factor od 0.005?  That will be high enough for large
> > tables, which seem to be the main concern here.
> 
> I agree with that, however, I'd thought 0.01, just so we're still
> close to having about 100 times less work to do for huge insert-only
> tables when it comes to having to perform an anti-wraparound vacuum.

Fine with me.

> > I am still sorry to see more proactive freezing go, which would
> > reduce the impact for truly insert-only tables.
> > After sleeping on it, here is one last idea.
> >
> > Granted, freezing with vacuum_freeze_min_age = 0 poses a problem
> > for those parts of the table that will receive updates or deletes.
> > But what if insert-triggered vacuum operates with - say -
> > one tenth of vacuum_freeze_min_age (unless explicitly overridden
> > for the table)?  That might still be high enough not to needlessly
> > freeze too many tuples that will still be modified, but it will
> > reduce the impact on insert-only tables.
> 
> I think that might be a bit too magical and may not be what some
> people want. I know that most people won't set
> autovacuum_freeze_min_age to 0 for insert-only tables, but we can at
> least throw something in the documents to mention it's a good idea,
> however, looking over the docs I'm not too sure the best place to note
> that down.

I was afraid that idea would be too cute to appeal.

> I've attached a small fix which I'd like to apply to your v8 patch.
> With that, and pending one final look, I'd like to push this during my
> Monday (New Zealand time).  So if anyone strongly objects to that,
> please state their case before then.

Thanks!

I have rolled your edits into the attached patch v9, rebased against
current master.

Yours,
Laurenz Albe
From 3ba4b572d82969bbb2af787d1bccc72f417ad3a0 Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Thu, 19 Mar 2020 20:26:43 +0100
Subject: [PATCH] Autovacuum tables that have received only inserts

Add "autovacuum_vacuum_insert_threshold" and
"autovacuum_vacuum_insert_scale_factor" GUC and reloption.
The default value for the threshold is 1000;
the scale factor defaults to 0.01.

Any table that has received more inserts since it was
last vacuumed (and that is not vacuumed for another
reason) will be autovacuumed.

This avoids the known problem that insert-only tables
are never autovacuumed until they need to have their
anti-wraparound autovacuum, which then can be massive
and disruptive.

To track the number of inserts since the last vacuum,
introduce a StatTabEntry "inserts_since_vacuum" that
gets reset to 0 after a vacuum.  This value is available
in "pg_stat_*_tables" as "n_ins_since_vacuum".

Author: Laurenz Albe, based on a suggestion from Darafei Praliaskouski
Reviewed-by: David Rowley, Justin Pryzby, Masahiko Sawada, Andres Freund
Discussion: https://postgr.es/m/CAC8Q8t+j36G_bLF=+0imo6jgnwnlnwb1tujxujr-+x8zcct...@mail.gmail.com
---
 doc/src/sgml/config.sgml  | 41 +++
 doc/src/sgml/maintenance.sgml | 23 ---
 doc/src/sgml/monitoring.sgml  |  5 +++
 doc/src/sgml/ref/create_table.sgml| 30 ++
 src/backend/access/common/reloptions.c| 22 ++
 src/backend/catalog/system_views.sql  |  1 +
 src/backend/postmaster/autovacuum.c   | 22 --
 src/backend/postmaster/pgstat.c   |  5 +++
 src/backend/utils/adt/pgstatfuncs.c   | 16 
 src/backend/utils/misc/guc.c  | 20 +
 src/backend/utils/misc/postgresql.conf.sample |  4 ++
 src/bin/psql/tab-complete.c   |  4 ++
 src/include/catalog/pg_proc.dat   |  5 +++
 src/include/pgstat.h  |  1 +
 src/include/postmaster/autovacuum.h   |  2 +
 src/include/utils/rel.h   |  2 +
 src/test/regress/expected/rules.out   |  3 ++
 17 files changed, 198 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 70854ae298..d1ee8e53f2 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7301,6 +7301,26 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
  
 
+ 
+  autovacuum_vacuum_insert_threshold (integer)
+  
+   autovacuum_vacuum_insert_threshold
+   configuration parameter
+  
+  
+  
+   
+Specifies the number of inserted tuples needed to trigger a
+VACUUM in any one table.
+The default is 1000 tuples.
+This parameter can only be set in the postgresql.conf
+file or on the server command line;
+but the setting can be overridden for individual tables by
+changing table storage parameters.
+   
+  
+ 
+
  
   autovac

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-19 Thread Justin Pryzby

On Thu, Mar 19, 2020 at 09:52:11PM +1300, David Rowley wrote:
> On Thu, 19 Mar 2020 at 19:07, Justin Pryzby  wrote:
> >
> > On Fri, Mar 13, 2020 at 02:38:51PM -0700, Andres Freund wrote:
> > > On 2020-03-13 13:44:42 -0500, Justin Pryzby wrote:
> > > > Having now played with the patch, I'll suggest that 1000 is too 
> > > > high a
> > > > threshold.  If autovacuum runs without FREEZE, I don't see why it 
> > > > couldn't be
> > > > much lower (10?) or use (0.2 * n_ins + 50) like the other 
> > > > autovacuum GUC.
> > >
> > > ISTM that the danger of regressing workloads due to suddenly repeatedly
> > > scanning huge indexes that previously were never / rarely scanned is
> > > significant (if there's a few dead tuples, otherwise most indexes will
> > > be able to skip the scan since the vacuum_cleanup_index_scale_factor
> > > introduction)).
> >
> > We could try to avoid that issue here:
> >
> > |/* If any tuples need to be deleted, perform final vacuum cycle */
> > |/* XXX put a threshold on min number of tuples here? */
> > |if (dead_tuples->num_tuples > 0)
> > |{
> > |/* Work on all the indexes, and then the heap */
> > |lazy_vacuum_all_indexes(onerel, Irel, indstats, 
> > vacrelstats,
> > |lps, 
> > nindexes);
> > |
> > |/* Remove tuples from heap */
> > |lazy_vacuum_heap(onerel, vacrelstats);
> > |}
> >
> > As you said, an insert-only table can skip scanning indexes, but an
> > insert-mostly table currently cannot.
> >
> > Maybe we could skip the final index scan if we hit the autovacuum insert
> > threshold?
> >
> > I still don't like mixing the thresholds with the behavior they imply, but
> > maybe what's needed is better docs describing all of vacuum's roles and its
> > procedure and priority in executing them.
> >
> > The dead tuples would just be cleaned up during a future vacuum, right ?  So
> > that would be less efficient, but (no surprise) there's a balance to strike 
> > and
> > that can be tuned.  I think that wouldn't be an issue for most people; the
> > worst case would be if you set high maint_work_mem, and low insert 
> > threshold,
> > and you got increased bloat.  But faster vacuum if we avoided idx scans.
> >
> > That might allow more flexibility in our discussion around default values 
> > for
> > thresholds for insert-triggered vacuum.
> 
> We went over this a bit already. The risk is that if you have an
> insert-mostly table and always trigger an auto-vacuum for inserts and
> never due to dead tuples, then you'll forego the index cleanup every
> time causing the indexes to bloat over time.

At the time, we were talking about skipping index *cleanup* phase.
Which also incurs an index scan.
>+  tab->at_params.index_cleanup = insert_only ? 
>VACOPT_TERNARY_DISABLED : VACOPT_TERNARY_DEFAULT;
We decided not to skip this, since it would allow index bloat, if vacuum were
only ever run due to inserts, and cleanup never happened.

I'm suggesting the possibility of skipping not index *cleanup* but index (and
heap) *vacuum*.  So that saves an index scan itself, and I think implies later
skipping cleanup (since no index tuples were removed).  But more importantly, I
think if we skip that during an insert-triggered vacuum, the dead heap tuples
are still there during the next vacuum instance.  So unlike index cleanup
(where we don't keep track of the total number of dead index tuples), this can
accumulate over time, and eventually trigger index+heap vacuum, and cleanup.

> I think any considerations to add some sort of threshold on dead
> tuples before cleaning the index should be considered independently.

+1, yes.  I'm hoping to anticipate and mitigate any objections and regressions
more than raise them.

-- 
Justin

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-19 Thread David Rowley

On Thu, 19 Mar 2020 at 19:07, Justin Pryzby  wrote:
>
> On Fri, Mar 13, 2020 at 02:38:51PM -0700, Andres Freund wrote:
> > On 2020-03-13 13:44:42 -0500, Justin Pryzby wrote:
> > > Having now played with the patch, I'll suggest that 1000 is too high a
> > > threshold.  If autovacuum runs without FREEZE, I don't see why it 
> > > couldn't be
> > > much lower (10?) or use (0.2 * n_ins + 50) like the other autovacuum 
> > > GUC.
> >
> > ISTM that the danger of regressing workloads due to suddenly repeatedly
> > scanning huge indexes that previously were never / rarely scanned is
> > significant (if there's a few dead tuples, otherwise most indexes will
> > be able to skip the scan since the vacuum_cleanup_index_scale_factor
> > introduction)).
>
> We could try to avoid that issue here:
>
> |/* If any tuples need to be deleted, perform final vacuum cycle */
> |/* XXX put a threshold on min number of tuples here? */
> |if (dead_tuples->num_tuples > 0)
> |{
> |/* Work on all the indexes, and then the heap */
> |lazy_vacuum_all_indexes(onerel, Irel, indstats, vacrelstats,
> |lps, 
> nindexes);
> |
> |/* Remove tuples from heap */
> |lazy_vacuum_heap(onerel, vacrelstats);
> |}
>
> As you said, an insert-only table can skip scanning indexes, but an
> insert-mostly table currently cannot.
>
> Maybe we could skip the final index scan if we hit the autovacuum insert
> threshold?
>
> I still don't like mixing the thresholds with the behavior they imply, but
> maybe what's needed is better docs describing all of vacuum's roles and its
> procedure and priority in executing them.
>
> The dead tuples would just be cleaned up during a future vacuum, right ?  So
> that would be less efficient, but (no surprise) there's a balance to strike 
> and
> that can be tuned.  I think that wouldn't be an issue for most people; the
> worst case would be if you set high maint_work_mem, and low insert threshold,
> and you got increased bloat.  But faster vacuum if we avoided idx scans.
>
> That might allow more flexibility in our discussion around default values for
> thresholds for insert-triggered vacuum.

We went over this a bit already. The risk is that if you have an
insert-mostly table and always trigger an auto-vacuum for inserts and
never due to dead tuples, then you'll forego the index cleanup every
time causing the indexes to bloat over time.

I think any considerations to add some sort of threshold on dead
tuples before cleaning the index should be considered independently.
Trying to get everyone to agree to what's happening here is hard
enough without adding more options to the list.  I understand that
there may be small issues with insert-only tables with a tiny number
of dead tuples, perhaps due to aborts could cause some issues while
scanning the index, but that's really one of the big reasons why the
10 million insert threshold has been added. Just in the past few hours
we've talked about having a very small scale factor to protect from
over-vacuum on huge tables that see 10 million tuples inserted in
short spaces of time.  I think that's a good compromise, however,
certainly not perfect.

David

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-19 Thread David Rowley

On Thu, 19 Mar 2020 at 18:45, Laurenz Albe  wrote:
>
> On Tue, 2020-03-17 at 18:02 -0700, Andres Freund wrote:
> > I don't think a default scale factor of 0 is going to be ok. For
> > large-ish tables this will basically cause permanent vacuums. And it'll
> > sometimes trigger for tables that actually coped well so far. 10 million
> > rows could be a few seconds, not more.
> >
> > I don't think that the argument that otherwise a table might not get
> > vacuumed before autovacuum_freeze_max_age is convincing enough.
> >
> > a) if that's indeed the argument, we should increase the default
> >   autovacuum_freeze_max_age - now that there's insert triggered vacuums,
> >   the main argument against that from before isn't valid anymore.
> >
> > b) there's not really a good arguments for vacuuming more often than
> >   autovacuum_freeze_max_age for such tables. It'll not be not frequent
> >   enough to allow IOS for new data, and you're not preventing
> >   anti-wraparound vacuums from happening.
>
> According to my reckoning, that is the remaining objection to the patch
> as it is (with ordinary freezing behavior).
>
> How about a scale_factor od 0.005?  That will be high enough for large
> tables, which seem to be the main concern here.

I agree with that, however, I'd thought 0.01, just so we're still
close to having about 100 times less work to do for huge insert-only
tables when it comes to having to perform an anti-wraparound vacuum.

> I fully agree with your point a) - should that be part of the patch?

I think it will be a good idea to increase this, but I really don't
think this patch should be touching it.  It's something to put on the
issues list for after the CF so more people have the bandwidth to chip
in their thoughts.

> I am not sure about b).  In my mind, the objective is not to prevent
> anti-wraparound vacuums, but to see that they have less work to do,
> because previous autovacuum runs already have frozen anything older than
> vacuum_freeze_min_age.  So, assuming linear growth, the number of tuples
> to freeze during any run would be at most one fourth of today's number
> when we hit autovacuum_freeze_max_age.

I hear what Andres is saying about proactive freezing for already
dirty pages.  I think that's worth looking into, but don't feel like
we need to do it for this patch. The patch is worthy without it and
such a change affects more than insert-vacuums, so should be a
separate commit.

If people really do have an insert-only table then we can recommend
that they set the table's autovacuum_freeze_min_age to 0.

> I am still sorry to see more proactive freezing go, which would
> reduce the impact for truly insert-only tables.
> After sleeping on it, here is one last idea.
>
> Granted, freezing with vacuum_freeze_min_age = 0 poses a problem
> for those parts of the table that will receive updates or deletes.
> But what if insert-triggered vacuum operates with - say -
> one tenth of vacuum_freeze_min_age (unless explicitly overridden
> for the table)?  That might still be high enough not to needlessly
> freeze too many tuples that will still be modified, but it will
> reduce the impact on insert-only tables.

I think that might be a bit too magical and may not be what some
people want. I know that most people won't set
autovacuum_freeze_min_age to 0 for insert-only tables, but we can at
least throw something in the documents to mention it's a good idea,
however, looking over the docs I'm not too sure the best place to note
that down.

I've attached a small fix which I'd like to apply to your v8 patch.
With that, and pending one final look, I'd like to push this during my
Monday (New Zealand time).  So if anyone strongly objects to that,
please state their case before then.

David
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7befc63860..6cad079132 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7342,7 +7342,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH 
csv;
 Specifies a fraction of the table size to add to
 autovacuum_vacuum_insert_threshold
 when deciding whether to trigger a VACUUM.
-The default is 0.0, which means that the table size has no effect.
+The default is 0.01 (1% of table size).
 This parameter can only be set in the 
postgresql.conf
 file or on the server command line;
 but the setting can be overridden for individual tables by
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index dbf418c62a..904fbffd94 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -777,20 +777,28 @@ vacuum threshold = vacuum base threshold + vacuum scale 
factor * number of tuple
 ,
 and the number of tuples is
 pg_class.reltuples.
-The number of obsolete tuples is obtained from the statistics
-collector; it is a semi-accurate count updated by each
-UPDATE and DELETE operation.  (It
-is only

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-18 Thread Justin Pryzby

On Fri, Mar 13, 2020 at 02:38:51PM -0700, Andres Freund wrote:
> On 2020-03-13 13:44:42 -0500, Justin Pryzby wrote:
> > Having now played with the patch, I'll suggest that 1000 is too high a
> > threshold.  If autovacuum runs without FREEZE, I don't see why it couldn't 
> > be
> > much lower (10?) or use (0.2 * n_ins + 50) like the other autovacuum 
> > GUC.
> 
> ISTM that the danger of regressing workloads due to suddenly repeatedly
> scanning huge indexes that previously were never / rarely scanned is
> significant (if there's a few dead tuples, otherwise most indexes will
> be able to skip the scan since the vacuum_cleanup_index_scale_factor
> introduction)).

We could try to avoid that issue here:

|/* If any tuples need to be deleted, perform final vacuum cycle */
|/* XXX put a threshold on min number of tuples here? */
|if (dead_tuples->num_tuples > 0)
|{
|/* Work on all the indexes, and then the heap */
|lazy_vacuum_all_indexes(onerel, Irel, indstats, vacrelstats,
|lps, nindexes);
|
|/* Remove tuples from heap */
|lazy_vacuum_heap(onerel, vacrelstats);
|}

As you said, an insert-only table can skip scanning indexes, but an
insert-mostly table currently cannot.

Maybe we could skip the final index scan if we hit the autovacuum insert
threshold?

I still don't like mixing the thresholds with the behavior they imply, but
maybe what's needed is better docs describing all of vacuum's roles and its
procedure and priority in executing them.

The dead tuples would just be cleaned up during a future vacuum, right ?  So
that would be less efficient, but (no surprise) there's a balance to strike and
that can be tuned.  I think that wouldn't be an issue for most people; the
worst case would be if you set high maint_work_mem, and low insert threshold,
and you got increased bloat.  But faster vacuum if we avoided idx scans.

That might allow more flexibility in our discussion around default values for
thresholds for insert-triggered vacuum.

-- 
Justin

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-18 Thread Laurenz Albe

On Tue, 2020-03-17 at 18:02 -0700, Andres Freund wrote:
> I don't think a default scale factor of 0 is going to be ok. For
> large-ish tables this will basically cause permanent vacuums. And it'll
> sometimes trigger for tables that actually coped well so far. 10 million
> rows could be a few seconds, not more.
> 
> I don't think that the argument that otherwise a table might not get
> vacuumed before autovacuum_freeze_max_age is convincing enough.
> 
> a) if that's indeed the argument, we should increase the default
>   autovacuum_freeze_max_age - now that there's insert triggered vacuums,
>   the main argument against that from before isn't valid anymore.
> 
> b) there's not really a good arguments for vacuuming more often than
>   autovacuum_freeze_max_age for such tables. It'll not be not frequent
>   enough to allow IOS for new data, and you're not preventing
>   anti-wraparound vacuums from happening.

According to my reckoning, that is the remaining objection to the patch
as it is (with ordinary freezing behavior).

How about a scale_factor od 0.005?  That will be high enough for large
tables, which seem to be the main concern here.

I fully agree with your point a) - should that be part of the patch?

I am not sure about b).  In my mind, the objective is not to prevent
anti-wraparound vacuums, but to see that they have less work to do,
because previous autovacuum runs already have frozen anything older than
vacuum_freeze_min_age.  So, assuming linear growth, the number of tuples
to freeze during any run would be at most one fourth of today's number
when we hit autovacuum_freeze_max_age.

I am still sorry to see more proactive freezing go, which would
reduce the impact for truly insert-only tables.
After sleeping on it, here is one last idea.

Granted, freezing with vacuum_freeze_min_age = 0 poses a problem
for those parts of the table that will receive updates or deletes.
But what if insert-triggered vacuum operates with - say -
one tenth of vacuum_freeze_min_age (unless explicitly overridden
for the table)?  That might still be high enough not to needlessly
freeze too many tuples that will still be modified, but it will
reduce the impact on insert-only tables.

Yours,
Laurenz Albe

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-18 Thread Laurenz Albe

On Tue, 2020-03-17 at 17:26 -0700, Andres Freund wrote:
> On 2020-03-17 01:14:02 +0100, Laurenz Albe wrote:
> > lazy_check_needs_freeze() is only called for an aggressive vacuum, which
> > this isn't.
> 
> Hm? I mean some of these will be aggressive vacuums, because it's older
> than vacuum_freeze_table_age? And the lower age limit would make that
> potentially more painful, no?

You are right.  I thought of autovacuum_freeze_max_age, but not of
vacuum_freeze_table_age.

Autovacuum configuration is so woefully complicated that it makes me
feel bad to propose two more parameters :^(

Yours,
Laurenz Albe

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-18 Thread James Coleman

On Wed, Mar 18, 2020 at 1:08 PM Andres Freund  wrote:
>
> Hi,
>
> On 2020-03-17 21:58:53 -0400, James Coleman wrote:
> > On Tue, Mar 17, 2020 at 9:03 PM Andres Freund  wrote:
> > >
> > > Hi,
> > >
> > > On 2020-03-17 20:42:07 +0100, Laurenz Albe wrote:
> > > > > I think Andres was thinking this would maybe be an optimization 
> > > > > independent of
> > > > > is_insert_only (?)
> > > >
> > > > I wasn't sure.
> > >
> > > I'm not sure myself - but I'm doubtful that using a 0 min age by default
> > > will be ok.
> > >
> > > I was trying to say (in a later email) that I think it might be a good
> > > compromise to opportunistically freeze if we're dirtying the page
> > > anyway, but not optimize WAL emission etc. That's a pretty simple
> > > change, and it'd address a lot of the potential performance regressions,
> > > while still freezing for the "first" vacuum in insert only workloads.
> >
> > If we have truly insert-only tables, then doesn't vacuuming with
> > freezing every tuple actually decrease total vacuum cost (perhaps
> > significantly) since otherwise every vacuum keeps having to scan the
> > heap for dead tuples on pages where we know there are none? Those
> > pages could conceptually be frozen and ignored, but are not frozen
> > because of the default behavior, correct?
>
> Yes.
>
>
> > If that's all true, it seems to me that removing that part of the
> > patch significantly lowers its value.
>
> Well, perfect sometimes is the enemy of the good. We gotta get something
> in, and having some automated vacuuming for insert mostly/only tables is
> a huge step forward. And avoiding regressions is an important part of
> doing so.

Yep, as I responded to Justin, in thinking about the details I'd lost
sight of the biggest issue.

So I withdraw that concern in favor of getting something out that
improves things now.

...

> > If we opportunistically freeze only if we're already dirtying a page,
> > would that help a truly insert-only workload?
>
> Yes.

Only if some other process hasn't already read and caused hint bits to
be written, correct? Or am I missing something there too?

> > E.g., are there hint bits on the page that would need to change the
> > first time we vacuum a full page with no dead tuples?
>
> Yes. HEAP_XMIN_COMMITTED.

This can be set opportunistically by other non-vacuum processes though?

> > I would have assumed the answer was "no" (since if so I think it would
> > follow that _all_ pages need updated the first time they're
> > vacuumed?).
>
> That is the case. Although they might already be set when the tuples are
> accessed for other reasons.

Ah, I think this is answering what I'd asked above.

I'm very excited to see improvements in flight on this use case.

Thanks,
James

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-18 Thread James Coleman

On Tue, Mar 17, 2020 at 11:37 PM Justin Pryzby  wrote:
>
> On Tue, Mar 17, 2020 at 09:58:53PM -0400, James Coleman wrote:
> > On Tue, Mar 17, 2020 at 9:03 PM Andres Freund  wrote:
> > >
> > > On 2020-03-17 20:42:07 +0100, Laurenz Albe wrote:
> > > > > I think Andres was thinking this would maybe be an optimization 
> > > > > independent of
> > > > > is_insert_only (?)
> > > >
> > > > I wasn't sure.
> > >
> > > I'm not sure myself - but I'm doubtful that using a 0 min age by default
> > > will be ok.
> > >
> > > I was trying to say (in a later email) that I think it might be a good
> > > compromise to opportunistically freeze if we're dirtying the page
> > > anyway, but not optimize WAL emission etc. That's a pretty simple
> > > change, and it'd address a lot of the potential performance regressions,
> > > while still freezing for the "first" vacuum in insert only workloads.
> >
> > If we have truly insert-only tables, then doesn't vacuuming with
> > freezing every tuple actually decrease total vacuum cost (perhaps
> > significantly) since otherwise every vacuum keeps having to scan the
> > heap for dead tuples on pages where we know there are none? Those
> > pages could conceptually be frozen and ignored, but are not frozen
> > because of the default behavior, correct?
>
> The essential part of this patch is to trigger vacuum *at all* on an
> insert-only table.  Before today's updated patch, it also used FREEZE on any
> table which hit the new insert threshold.  The concern I raised is for
> insert-MOSTLY tables.  I thought it might be an issue if repeatedly freezing
> updated tuples caused vacuum to be too slow, especially if they're distributed
> in pages all across the table rather than clustered.

Yeah, for some reason I'd completely forgotten (caught up in thinking
about the best possible outcome re: freezing insert only tables) that
the bigger problem was just triggering vacuum at all on those tables.

> And I asked that the behavior (FREEZE) be configurable by a separate setting
> than the one that triggers autovacuum to run.  FREEZE is already controlled by
> the vacuum_freeze_table_age param.
>
> I think you're right that VACUUM FREEZE on an insert-only table would be less
> expensive than vacuum once without freeze and vacuum again later, which uses
> freeze.  To me, that suggests setting vacuum_freeze_table_age to a low value 
> on
> those tables.
>
> Regular vacuum avoids scanning all-visible pages, so for an insert-only table
> pages should only be vacuumed once (if frozen the 1st time) or twice (if not).
>
>  * Except when aggressive is set, we want to skip pages that are
>  * all-visible according to the visibility map, but only when we can 
> skip
>
> postgres=# CREATE TABLE t (i int) ; INSERT INTO t SELECT 
> generate_series(1,99); VACUUM VERBOSE t; VACUUM VERBOSE t;
> ...
> INFO:  "t": found 0 removable, 99 nonremovable row versions in 4425 out 
> of 4425 pages
> ...
> VACUUM
> Time: 106.038 ms
> INFO:  "t": found 0 removable, 175 nonremovable row versions in 1 out of 4425 
> pages
> VACUUM
> Time: 1.828 ms
>
> => That's its not very clear way of saying that it only scanned 1 page the 2nd
> time around.

I didn't realize that about the visibility map being taken into account.

> > We have tables that log each change to a business object (as I suspect
> > many transactional workloads do), and I've often thought that
> > immediately freeze every page as soon as it fills up would be a real
> > win for us.
> >
> > If that's all true, it seems to me that removing that part of the
> > patch significantly lowers its value.
>
> > If we opportunistically freeze only if we're already dirtying a page,
> > would that help a truly insert-only workload? E.g., are there hint
> > bits on the page that would need to change the first time we vacuum a
> > full page with no dead tuples? I would have assumed the answer was
> > "no" (since if so I think it would follow that _all_ pages need
> > updated the first time they're vacuumed?).
>
> You probably know that hint bits are written by the first process to access 
> the
> tuple after it was written.  I think you're asking if the first *vacuum*
> requires additional writes beyond that.  And I think vacuum wouldn't touch the
> page until it decides to freeze tuples.

I think my assumption is that (at least in our case), the first
process to access will definitely not be vacuum on any regular basis.

> I do have a patch to display the number of hint bits written and pages frozen.
> https://www.postgresql.org/message-id/flat/20200126141328.GP13621%40telsasoft.com

I'll take a look at that too.

> > But if that's the case, then this kind of opportunistic freezing wouldn't
> > help this kind of workload. Maybe there's something I'm misunderstanding
> > about how vacuum works though.
>
> I am reminding myself about vacuum with increasing frequency and usually still
> learn something new.

For sure.

Thanks,
James

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-18 Thread Andres Freund

Hi,

On 2020-03-17 21:58:53 -0400, James Coleman wrote:
> On Tue, Mar 17, 2020 at 9:03 PM Andres Freund  wrote:
> >
> > Hi,
> >
> > On 2020-03-17 20:42:07 +0100, Laurenz Albe wrote:
> > > > I think Andres was thinking this would maybe be an optimization 
> > > > independent of
> > > > is_insert_only (?)
> > >
> > > I wasn't sure.
> >
> > I'm not sure myself - but I'm doubtful that using a 0 min age by default
> > will be ok.
> >
> > I was trying to say (in a later email) that I think it might be a good
> > compromise to opportunistically freeze if we're dirtying the page
> > anyway, but not optimize WAL emission etc. That's a pretty simple
> > change, and it'd address a lot of the potential performance regressions,
> > while still freezing for the "first" vacuum in insert only workloads.
> 
> If we have truly insert-only tables, then doesn't vacuuming with
> freezing every tuple actually decrease total vacuum cost (perhaps
> significantly) since otherwise every vacuum keeps having to scan the
> heap for dead tuples on pages where we know there are none? Those
> pages could conceptually be frozen and ignored, but are not frozen
> because of the default behavior, correct?

Yes.


> If that's all true, it seems to me that removing that part of the
> patch significantly lowers its value.

Well, perfect sometimes is the enemy of the good. We gotta get something
in, and having some automated vacuuming for insert mostly/only tables is
a huge step forward. And avoiding regressions is an important part of
doing so.

I outlined the steps we could take to allow for more aggressive
vacuuming upthread.


> If we opportunistically freeze only if we're already dirtying a page,
> would that help a truly insert-only workload?

Yes.


> E.g., are there hint bits on the page that would need to change the
> first time we vacuum a full page with no dead tuples?

Yes. HEAP_XMIN_COMMITTED.


> I would have assumed the answer was "no" (since if so I think it would
> follow that _all_ pages need updated the first time they're
> vacuumed?).

That is the case. Although they might already be set when the tuples are
accessed for other reasons.


Greetings,

Andres Freund

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-17 Thread Justin Pryzby

On Tue, Mar 17, 2020 at 09:58:53PM -0400, James Coleman wrote:
> On Tue, Mar 17, 2020 at 9:03 PM Andres Freund  wrote:
> >
> > On 2020-03-17 20:42:07 +0100, Laurenz Albe wrote:
> > > > I think Andres was thinking this would maybe be an optimization 
> > > > independent of
> > > > is_insert_only (?)
> > >
> > > I wasn't sure.
> >
> > I'm not sure myself - but I'm doubtful that using a 0 min age by default
> > will be ok.
> >
> > I was trying to say (in a later email) that I think it might be a good
> > compromise to opportunistically freeze if we're dirtying the page
> > anyway, but not optimize WAL emission etc. That's a pretty simple
> > change, and it'd address a lot of the potential performance regressions,
> > while still freezing for the "first" vacuum in insert only workloads.
> 
> If we have truly insert-only tables, then doesn't vacuuming with
> freezing every tuple actually decrease total vacuum cost (perhaps
> significantly) since otherwise every vacuum keeps having to scan the
> heap for dead tuples on pages where we know there are none? Those
> pages could conceptually be frozen and ignored, but are not frozen
> because of the default behavior, correct?

The essential part of this patch is to trigger vacuum *at all* on an
insert-only table.  Before today's updated patch, it also used FREEZE on any
table which hit the new insert threshold.  The concern I raised is for
insert-MOSTLY tables.  I thought it might be an issue if repeatedly freezing
updated tuples caused vacuum to be too slow, especially if they're distributed
in pages all across the table rather than clustered.

And I asked that the behavior (FREEZE) be configurable by a separate setting
than the one that triggers autovacuum to run.  FREEZE is already controlled by
the vacuum_freeze_table_age param.

I think you're right that VACUUM FREEZE on an insert-only table would be less
expensive than vacuum once without freeze and vacuum again later, which uses
freeze.  To me, that suggests setting vacuum_freeze_table_age to a low value on
those tables.

Regular vacuum avoids scanning all-visible pages, so for an insert-only table
pages should only be vacuumed once (if frozen the 1st time) or twice (if not).

 * Except when aggressive is set, we want to skip pages that are
 * all-visible according to the visibility map, but only when we can 
skip

postgres=# CREATE TABLE t (i int) ; INSERT INTO t SELECT 
generate_series(1,99); VACUUM VERBOSE t; VACUUM VERBOSE t;
...
INFO:  "t": found 0 removable, 99 nonremovable row versions in 4425 out of 
4425 pages
...
VACUUM
Time: 106.038 ms
INFO:  "t": found 0 removable, 175 nonremovable row versions in 1 out of 4425 
pages
VACUUM
Time: 1.828 ms

=> That's its not very clear way of saying that it only scanned 1 page the 2nd
time around.

> We have tables that log each change to a business object (as I suspect
> many transactional workloads do), and I've often thought that
> immediately freeze every page as soon as it fills up would be a real
> win for us.
> 
> If that's all true, it seems to me that removing that part of the
> patch significantly lowers its value.

> If we opportunistically freeze only if we're already dirtying a page,
> would that help a truly insert-only workload? E.g., are there hint
> bits on the page that would need to change the first time we vacuum a
> full page with no dead tuples? I would have assumed the answer was
> "no" (since if so I think it would follow that _all_ pages need
> updated the first time they're vacuumed?).

You probably know that hint bits are written by the first process to access the
tuple after it was written.  I think you're asking if the first *vacuum*
requires additional writes beyond that.  And I think vacuum wouldn't touch the
page until it decides to freeze tuples.

I do have a patch to display the number of hint bits written and pages frozen.
https://www.postgresql.org/message-id/flat/20200126141328.GP13621%40telsasoft.com

> But if that's the case, then this kind of opportunistic freezing wouldn't
> help this kind of workload. Maybe there's something I'm misunderstanding
> about how vacuum works though.

I am reminding myself about vacuum with increasing frequency and usually still
learn something new.

-- 
Justin

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-17 Thread James Coleman

On Tue, Mar 17, 2020 at 9:03 PM Andres Freund  wrote:
>
> Hi,
>
> On 2020-03-17 20:42:07 +0100, Laurenz Albe wrote:
> > > I think Andres was thinking this would maybe be an optimization 
> > > independent of
> > > is_insert_only (?)
> >
> > I wasn't sure.
>
> I'm not sure myself - but I'm doubtful that using a 0 min age by default
> will be ok.
>
> I was trying to say (in a later email) that I think it might be a good
> compromise to opportunistically freeze if we're dirtying the page
> anyway, but not optimize WAL emission etc. That's a pretty simple
> change, and it'd address a lot of the potential performance regressions,
> while still freezing for the "first" vacuum in insert only workloads.

If we have truly insert-only tables, then doesn't vacuuming with
freezing every tuple actually decrease total vacuum cost (perhaps
significantly) since otherwise every vacuum keeps having to scan the
heap for dead tuples on pages where we know there are none? Those
pages could conceptually be frozen and ignored, but are not frozen
because of the default behavior, correct?

We have tables that log each change to a business object (as I suspect
many transactional workloads do), and I've often thought that
immediately freeze every page as soon as it fills up would be a real
win for us.

If that's all true, it seems to me that removing that part of the
patch significantly lowers its value.

If we opportunistically freeze only if we're already dirtying a page,
would that help a truly insert-only workload? E.g., are there hint
bits on the page that would need to change the first time we vacuum a
full page with no dead tuples? I would have assumed the answer was
"no" (since if so I think it would follow that _all_ pages need
updated the first time they're vacuumed?). But if that's the case,
then this kind of opportunistic freezing wouldn't help this kind of
workload. Maybe there's something I'm misunderstanding about how
vacuum works though.

Thanks,
James

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-17 Thread Andres Freund

Hi,

On 2020-03-17 20:42:07 +0100, Laurenz Albe wrote:
> > I think Andres was thinking this would maybe be an optimization independent 
> > of
> > is_insert_only (?)
>
> I wasn't sure.

I'm not sure myself - but I'm doubtful that using a 0 min age by default
will be ok.

I was trying to say (in a later email) that I think it might be a good
compromise to opportunistically freeze if we're dirtying the page
anyway, but not optimize WAL emission etc. That's a pretty simple
change, and it'd address a lot of the potential performance regressions,
while still freezing for the "first" vacuum in insert only workloads.

> Add "autovacuum_vacuum_insert_threshold" and
> "autovacuum_vacuum_insert_scale_factor" GUC and reloption.
> The default value for the threshold is 1000.
> The scale factor defaults to 0, which means that it is
> effectively disabled, but it offers some flexibility
> to tune the feature similar to other autovacuum knobs.

I don't think a default scale factor of 0 is going to be ok. For
large-ish tables this will basically cause permanent vacuums. And it'll
sometimes trigger for tables that actually coped well so far. 10 million
rows could be a few seconds, not more.

I don't think that the argument that otherwise a table might not get
vacuumed before autovacuum_freeze_max_age is convincing enough.

a) if that's indeed the argument, we should increase the default
  autovacuum_freeze_max_age - now that there's insert triggered vacuums,
  the main argument against that from before isn't valid anymore.

b) there's not really a good arguments for vacuuming more often than
  autovacuum_freeze_max_age for such tables. It'll not be not frequent
  enough to allow IOS for new data, and you're not preventing
  anti-wraparound vacuums from happening.

Greetings,

Andres Freund

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-17 Thread Andres Freund

Hi,

On 2020-03-17 01:14:02 +0100, Laurenz Albe wrote:
> lazy_check_needs_freeze() is only called for an aggressive vacuum, which
> this isn't.

Hm? I mean some of these will be aggressive vacuums, because it's older
than vacuum_freeze_table_age? And the lower age limit would make that
potentially more painful, no?

Greetings,

Andres Freund

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-17 Thread Justin Pryzby

On Mon, Mar 16, 2020 at 07:47:13AM -0500, Justin Pryzby wrote:
> Normally, when someone complains about bad plan related to no index-onlyscan,
> we tell them to run vacuum, and if that helps, then ALTER TABLE .. SET
> (autovacuum_vacuum_scale_factor=0.005).
> 
> If there's two thresholds (4 GUCs and 4 relopts) for autovacuum, then do we
> have to help determine which one was being hit, and which relopt to set?

I don't think we came to any resolution on this.

Right now, to encourage IOS, we'd tell someone to set
autovacuum_vacuum_scale_factor=0.005.  That wouldn't work for an insert-only
table, but I've never heard back from someone that it didn't work.

So with this patch, we'd maybe tell them to do this, to also get IOS on
insert-only tables ?
|ALTER TABLE .. SET (autovacuum_vacuum_scale_factor=0.005, 
autovacuum_vacuum_insert_threshold=5);

> I wonder if the new insert GUCs should default to -1 (disabled)?  And the
> insert thresholds should be set by new insert relopt (if set), or by new 
> insert
> GUC (default -1), else normal relopt, or normal GUC.  The defaults would give
> 50 + 0.20*n.  When someone asks about IOS, we'd tell them to set
> autovacuum_vacuum_scale_factor=0.005, same as now.
> 
> vac_ins_scale_factor =
>   (relopts && relopts->vacuum_ins_scale_factor >= 0) ? 
> relopts->vacuum_ins_scale_factor :
>   autovacuum_vac_ins_scale >= 0 ? autovacuum_vac_ins_scale : 
>   (relopts && relopts->vacuum_scale_factor >= 0) ? 
> relopts->vacuum_scale_factor :
>   autovacuum_vac_scale;

-- 
Justin

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-17 Thread Laurenz Albe

On Tue, 2020-03-17 at 16:34 -0500, Justin Pryzby wrote:
> > > > Now we insert m number tuples (which are live).
> 
> .. but not yet counted in reltuples.

Thanks for pointing out my mistake.

Here is another patch, no changes except setting the upper limit
for autovacuum_vacuum_insert_scale_factor to 1e10.

Yours,
Laurenz Albe
From cc44042d4a07804a21abe7ad54a8dfafd3162228 Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Tue, 17 Mar 2020 22:51:46 +0100
Subject: [PATCH] Autovacuum tables that have received only inserts

Add "autovacuum_vacuum_insert_threshold" and
"autovacuum_vacuum_insert_scale_factor" GUC and reloption.
The default value for the threshold is 1000.
The scale factor defaults to 0, which means that by
default the table size doesn't contribute,
but this way we have some flexibility to tune the
feature similar to other autovacuum knobs.

Any table that has received more inserts since it was
last vacuumed (and that is not vacuumed for another
reason) will be autovacuumed.

This avoids the known problem that insert-only tables
are never autovacuumed until they need to have their
anti-wraparound autovacuum, which then can be massive
and disruptive.

To track the number of inserts since the last vacuum,
introduce a StatTabEntry "inserts_since_vacuum" that
gets reset to 0 after a vacuum.  This value is available
in "pg_stat_*_tables" as "n_ins_since_vacuum".

Author: Laurenz Albe, based on a suggestion from Darafei Praliaskouski
Reviewed-by: David Rowley, Justin Pryzby, Masahiko Sawada, Andres Freund
Discussion: https://postgr.es/m/CAC8Q8t+j36G_bLF=+0imo6jgnwnlnwb1tujxujr-+x8zcct...@mail.gmail.com
---
 doc/src/sgml/config.sgml  | 41 +++
 doc/src/sgml/maintenance.sgml |  5 +++
 doc/src/sgml/monitoring.sgml  |  5 +++
 doc/src/sgml/ref/create_table.sgml| 30 ++
 src/backend/access/common/reloptions.c| 22 ++
 src/backend/catalog/system_views.sql  |  1 +
 src/backend/postmaster/autovacuum.c   | 22 --
 src/backend/postmaster/pgstat.c   |  5 +++
 src/backend/utils/adt/pgstatfuncs.c   | 16 
 src/backend/utils/misc/guc.c  | 20 +
 src/backend/utils/misc/postgresql.conf.sample |  4 ++
 src/bin/psql/tab-complete.c   |  4 ++
 src/include/catalog/pg_proc.dat   |  5 +++
 src/include/pgstat.h  |  1 +
 src/include/postmaster/autovacuum.h   |  2 +
 src/include/utils/rel.h   |  2 +
 src/test/regress/expected/rules.out   |  3 ++
 17 files changed, 185 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index c1128f89ec..0ed1bb9d5e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7244,6 +7244,26 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
  
 
+ 
+  autovacuum_vacuum_insert_threshold (integer)
+  
+   autovacuum_vacuum_insert_threshold
+   configuration parameter
+  
+  
+  
+   
+Specifies the number of inserted tuples needed to trigger a
+VACUUM in any one table.
+The default is 1000 tuples.
+This parameter can only be set in the postgresql.conf
+file or on the server command line;
+but the setting can be overridden for individual tables by
+changing table storage parameters.
+   
+  
+ 
+
  
   autovacuum_analyze_threshold (integer)
   
@@ -7285,6 +7305,27 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
  
 
+ 
+  autovacuum_vacuum_insert_scale_factor (floating point)
+  
+   autovacuum_vacuum_insert_scale_factor
+   configuration parameter
+  
+  
+  
+   
+Specifies a fraction of the table size to add to
+autovacuum_vacuum_insert_threshold
+when deciding whether to trigger a VACUUM.
+The default is 0.0, which means that the table size has no effect.
+This parameter can only be set in the postgresql.conf
+file or on the server command line;
+but the setting can be overridden for individual tables by
+changing table storage parameters.
+   
+  
+ 
+
  
   autovacuum_analyze_scale_factor (floating point)
   
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index ec8bdcd7a4..dbf418c62a 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -786,6 +786,11 @@ vacuum threshold = vacuum base threshold + vacuum scale factor * number of tuple
 vacuum is performed to freeze old tuples and advance
 relfrozenxid; otherwise, only pages that have been modified
 since the last vacuum are scanned.
+Finally, a threshold similar to the above is calculated from
+ and
+.
+Tables that have received more inserts than the

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-17 Thread Justin Pryzby

On Tue, Mar 17, 2020 at 10:22:44PM +0100, Laurenz Albe wrote:
> On Tue, 2020-03-17 at 16:07 -0500, Justin Pryzby wrote:
> > > Assume a scale factor >= 1, for example 2, and n live tuples.
> > > The table has just been vacuumed.
> > > 
> > > Now we insert m number tuples (which are live).

.. but not yet counted in reltuples.

On Tue, Mar 17, 2020 at 10:22:44PM +0100, Laurenz Albe wrote:
> Note that this is different from autovacuum_vacuum_scale_factor,
> because inserted tuples are live, while dead tuples are not.

But they're not counted in reltuples until after the next vacuum (or analyze),
which is circular, since it's exactly what we're trying to schedule.

reltuples = classForm->reltuples;
vactuples = tabentry->n_dead_tuples;
+   instuples = tabentry->inserts_since_vacuum;
anltuples = tabentry->changes_since_analyze;
 
vacthresh = (float4) vac_base_thresh + vac_scale_factor * 
reltuples;
+   vacinsthresh = (float4) vac_ins_base_thresh + 
vac_ins_scale_factor * reltuples;

-- 
Justin

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-17 Thread Laurenz Albe

On Tue, 2020-03-17 at 16:07 -0500, Justin Pryzby wrote:
> > Assume a scale factor >= 1, for example 2, and n live tuples.
> > The table has just been vacuumed.
> > 
> > Now we insert m number tuples (which are live).
> > 
> > Then the condition
> > 
> >threshold + scale_factor * live_tuples < newly_inserted_tuples
> > 
> > becomes
> > 
> >1000 + 2 * (n + m) < m
> > 
> > which can never be true for non-negative n and m.
> > 
> > So a scale factor >= 1 disables the feature.
> 
> No, this is what we mailed about privately yesterday, and I demonstrated that
> autovac can still run with factor=100.  I said:

I remember.
Can you point out where exactly the flaw in my reasoning is?

> > It's a multiplier, not a percent out of 100 (fraction is not a great choice 
> > of
> > words).
> > 
> > &autovacuum_vac_scale,
> > 0.2, 0.0, 100.0,
> > 
> > The default is 0.2 (20%), so 100 means after updating/deleting 
> > 100*reltuples.

Yes, exactly.

> If 1.0 disabled the feature, it wouldn't make much sense to allow factor up to
> 100.

True, we could set the upper limit to 2, but it doesn't matter much.

Note that this is different from autovacuum_vacuum_scale_factor,
because inserted tuples are live, while dead tuples are not.

Yours,
Laurenz Albe

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-17 Thread Justin Pryzby

On Tue, Mar 17, 2020 at 10:01:15PM +0100, Laurenz Albe wrote:
> On Tue, 2020-03-17 at 14:56 -0500, Justin Pryzby wrote:
> > I still suggest scale_factor maximum of 1e10, like
> > 4d54543efa5eb074ead4d0fadb2af4161c943044
> > 
> > Which alows more effectively disabling it than a factor of 100, which would
> > progress like: ~1, 1e2, 1e4, 1e6, 1e8, 1e10, ..
> > 
> > I don't think that 1e4 would be a problem, but 1e6 and 1e8 could be.  With
> > 1e10, it's first vacuumed when there's 10billion inserts, if we didn't 
> > previous
> > hit the n_dead threshold.
> > 
> > I think that's ok?  If one wanted to disable it up to 1e11 tuples, I think
> > they'd disable autovacuum, or preferably just implement an vacuum job.
> 
> Assume a scale factor >= 1, for example 2, and n live tuples.
> The table has just been vacuumed.
> 
> Now we insert m number tuples (which are live).
> 
> Then the condition
> 
>   threshold + scale_factor * live_tuples < newly_inserted_tuples
> 
> becomes
> 
>   1000 + 2 * (n + m) < m
> 
> which can never be true for non-negative n and m.
> 
> So a scale factor >= 1 disables the feature.

No, this is what we mailed about privately yesterday, and I demonstrated that
autovac can still run with factor=100.  I said:

|It's a multiplier, not a percent out of 100 (fraction is not a great choice of
|words).
|
|&autovacuum_vac_scale,
|0.2, 0.0, 100.0,
|
|The default is 0.2 (20%), so 100 means after updating/deleting 100*reltuples.

live tuples is an estimate, from the most recent vacuum OR analyze.

If 1.0 disabled the feature, it wouldn't make much sense to allow factor up to
100.

+   {
+   {"autovacuum_vacuum_insert_scale_factor", PGC_SIGHUP, 
AUTOVACUUM,
+   gettext_noop("Number of tuple inserts prior to vacuum 
as a fraction of reltuples."),
+   NULL
+   },
+   &autovacuum_vac_ins_scale,
+   0.0, 0.0, 100.0,
+   NULL, NULL, NULL
+   },

-- 
Justin

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-17 Thread Laurenz Albe

On Tue, 2020-03-17 at 14:56 -0500, Justin Pryzby wrote:
> I still suggest scale_factor maximum of 1e10, like
> 4d54543efa5eb074ead4d0fadb2af4161c943044
> 
> Which alows more effectively disabling it than a factor of 100, which would
> progress like: ~1, 1e2, 1e4, 1e6, 1e8, 1e10, ..
> 
> I don't think that 1e4 would be a problem, but 1e6 and 1e8 could be.  With
> 1e10, it's first vacuumed when there's 10billion inserts, if we didn't 
> previous
> hit the n_dead threshold.
> 
> I think that's ok?  If one wanted to disable it up to 1e11 tuples, I think
> they'd disable autovacuum, or preferably just implement an vacuum job.

Assume a scale factor >= 1, for example 2, and n live tuples.
The table has just been vacuumed.

Now we insert m number tuples (which are live).

Then the condition

  threshold + scale_factor * live_tuples < newly_inserted_tuples

becomes

  1000 + 2 * (n + m) < m

which can never be true for non-negative n and m.

So a scale factor >= 1 disables the feature.

Yours,
Laurenz Albe

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-17 Thread Justin Pryzby

On Tue, Mar 17, 2020 at 08:42:07PM +0100, Laurenz Albe wrote:
> Also, since aggressive^H^H^H^H^H^H^H^H^H^Hproactive freezing seems to be a
> performance problem in some cases (pages with UPDATEs and DELETEs in otherwise
> INSERT-mostly tables), I have done away with the whole freezing thing,
> which made the whole patch much smaller and simpler.
> 
> Now all that is introduced are the threshold and scale factor and
> the new statistics counter to track the number of inserts since the last
> VACUUM.
> 
> Updated patch attached.
> 
> Perhaps we can reach a consensus on this reduced functionality.

+1

I still suggest scale_factor maximum of 1e10, like
4d54543efa5eb074ead4d0fadb2af4161c943044

Which alows more effectively disabling it than a factor of 100, which would
progress like: ~1, 1e2, 1e4, 1e6, 1e8, 1e10, ..

I don't think that 1e4 would be a problem, but 1e6 and 1e8 could be.  With
1e10, it's first vacuumed when there's 10billion inserts, if we didn't previous
hit the n_dead threshold.

I think that's ok?  If one wanted to disable it up to 1e11 tuples, I think
they'd disable autovacuum, or preferably just implement an vacuum job.

The commit message says:
|The scale factor defaults to 0, which means that it is
|effectively disabled, but it offers some flexibility
..but "it" is ambiguous, so should say something like: "the table size does not
contribute to the autovacuum threshold".

-- 
Justin

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-17 Thread Laurenz Albe

On Tue, 2020-03-17 at 10:24 -0500, Justin Pryzby wrote:
> > --- a/src/backend/access/heap/vacuumlazy.c
> > +++ b/src/backend/access/heap/vacuumlazy.c
> > @@ -1388,17 +1388,26 @@ lazy_scan_heap(Relation onerel, VacuumParams 
> > *params, LVRelStats *vacrelstats,
> >else
> >{
> >booltuple_totally_frozen;
> > + boolfreeze_all;
> >   
> >num_tuples += 1;
> >hastup = true;
> >   
> > + /*
> > +  * If any tuple was already frozen in the 
> > block and this is
> > +  * an insert-only vacuum, we might as well 
> > freeze all other
> > +  * tuples in that block.
> > +  */
> > + freeze_all = params->is_insert_only && 
> > has_dead_tuples;
> > +
> 
> You're checking if any (previously-scanned) tuple was *dead*, but I think you
> need to check nfrozen>=0.

Yes, that was a silly typo.

> Also, this will fail to freeze tuples on a page which *could* be
> oppotunistically-frozen, but *follow* the first tuple which *needs* to be
> frozen.

I am aware of that.  I was trying to see if that went in the direction that
Andres intends before trying more invasive modifications.

> I think Andres was thinking this would maybe be an optimization independent of
> is_insert_only (?)

I wasn't sure.

In the light of that, I have ripped out that code again.

Also, since aggressive^H^H^H^H^H^H^H^H^H^Hproactive freezing seems to be a
performance problem in some cases (pages with UPDATEs and DELETEs in otherwise
INSERT-mostly tables), I have done away with the whole freezing thing,
which made the whole patch much smaller and simpler.

Now all that is introduced are the threshold and scale factor and
the new statistics counter to track the number of inserts since the last
VACUUM.

> > + /* normal autovacuum shouldn't freeze aggressively */
> > + *insert_only = false;
> 
> Aggressively is a bad choice of words.  In the context of vacuum, it usually
> means "visit all pages, even those which are allvisible".

This is gone in the latest patch.

Updated patch attached.

Perhaps we can reach a consensus on this reduced functionality.

Yours,
Laurenz Albe
From 547481033898f6e8e028e45684d4bbaa86d6bc9c Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Tue, 17 Mar 2020 20:31:12 +0100
Subject: [PATCH] Autovacuum tables that have received only inserts

Add "autovacuum_vacuum_insert_threshold" and
"autovacuum_vacuum_insert_scale_factor" GUC and reloption.
The default value for the threshold is 1000.
The scale factor defaults to 0, which means that it is
effectively disabled, but it offers some flexibility
to tune the feature similar to other autovacuum knobs.

Any table that has received more inserts since it was
last vacuumed (and that is not vacuumed for another
reason) will be autovacuumed.

This avoids the known problem that insert-only tables
are never autovacuumed until they need to have their
anti-wraparound autovacuum, which then can be massive
and disruptive.

To track the number of inserts since the last vacuum,
introduce a StatTabEntry "inserts_since_vacuum" that
gets reset to 0 after a vacuum.  This value is available
in "pg_stat_*_tables" as "n_ins_since_vacuum".

Author: Laurenz Albe, based on a suggestion from Darafei Praliaskouski
Reviewed-by: David Rowley, Justin Pryzby, Masahiko Sawada, Andres Freund
Discussion: https://postgr.es/m/CAC8Q8t+j36G_bLF=+0imo6jgnwnlnwb1tujxujr-+x8zcct...@mail.gmail.com
---
 doc/src/sgml/config.sgml  | 41 +++
 doc/src/sgml/maintenance.sgml |  5 +++
 doc/src/sgml/monitoring.sgml  |  5 +++
 doc/src/sgml/ref/create_table.sgml| 30 ++
 src/backend/access/common/reloptions.c| 22 ++
 src/backend/catalog/system_views.sql  |  1 +
 src/backend/postmaster/autovacuum.c   | 22 --
 src/backend/postmaster/pgstat.c   |  5 +++
 src/backend/utils/adt/pgstatfuncs.c   | 16 
 src/backend/utils/misc/guc.c  | 20 +
 src/backend/utils/misc/postgresql.conf.sample |  4 ++
 src/bin/psql/tab-complete.c   |  4 ++
 src/include/catalog/pg_proc.dat   |  5 +++
 src/include/pgstat.h  |  1 +
 src/include/postmaster/autovacuum.h   |  2 +
 src/include/utils/rel.h   |  2 +
 src/test/regress/expected/rules.out   |  3 ++
 17 files changed, 185 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index c1128f89ec..0ed1bb9d5e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7244,6 +7244,26 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WI

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-17 Thread Justin Pryzby

On Tue, Mar 17, 2020 at 01:14:02AM +0100, Laurenz Albe wrote:
> lazy_check_needs_freeze() is only called for an aggressive vacuum, which
> this isn't.

> --- a/src/backend/access/heap/vacuumlazy.c
> +++ b/src/backend/access/heap/vacuumlazy.c
> @@ -1388,17 +1388,26 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, 
> LVRelStats *vacrelstats,
>   else
>   {
>   booltuple_totally_frozen;
> + boolfreeze_all;
>  
>   num_tuples += 1;
>   hastup = true;
>  
> + /*
> +  * If any tuple was already frozen in the block 
> and this is
> +  * an insert-only vacuum, we might as well 
> freeze all other
> +  * tuples in that block.
> +  */
> + freeze_all = params->is_insert_only && 
> has_dead_tuples;
> +

You're checking if any (previously-scanned) tuple was *dead*, but I think you
need to check nfrozen>=0.

Also, this will fail to freeze tuples on a page which *could* be
oppotunistically-frozen, but *follow* the first tuple which *needs* to be
frozen.

I think Andres was thinking this would maybe be an optimization independent of
is_insert_only (?)

>   /*
>* Each non-removable tuple must be checked to 
> see if it needs
>* freezing.  Note we already have exclusive 
> buffer lock.
>*/
>   if (heap_prepare_freeze_tuple(tuple.t_data,
>   
>   relfrozenxid, relminmxid,
> - 
>   FreezeLimit, MultiXactCutoff,
> + 
>   freeze_all ? 0 : FreezeLimit,
> + 
>   freeze_all ? 0 : MultiXactCutoff,
>   
>   &frozen[nfrozen],
>   
>   &tuple_totally_frozen))

> + /* normal autovacuum shouldn't freeze aggressively */
> + *insert_only = false;

Aggressively is a bad choice of words.  In the context of vacuum, it usually
means "visit all pages, even those which are allvisible".

-- 
Justin

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-16 Thread Laurenz Albe

On Mon, 2020-03-16 at 14:34 -0700, Andres Freund wrote:
> > > In particularl, I think it'd make sense to *not* have a lower freezing
> > > horizon for insert vacuums (because it *will* cause problems), but if
> > > the page is dirty anyway, then do the freezing even if freeze_min_age
> > > etc would otherwise prevent us from doing so?
> > 
> > I don't quite see why freezing tuples in insert-only tables will cause
> > problems - are you saying that more WAL will be written compared to
> > freezing with a higher freeze_min_age?
> 
> As far as I understand the patch may trigger additional vacuums e.g. for
> tables that have some heavily updated parts / key ranges, and otherwise
> are largely insert only (as long as there are in total considerably more
> inserts than updates). That's not at all uncommon.
> 
> And for the heavily updated regions the additional vacuums with a 0 min
> age could prove to be costly.  I've not looked at the new code, but it'd
> be particularly bad if the changes were to trigger the
> lazy_check_needs_freeze() check in lazy_scan_heap() - it'd have the
> potential for a lot more contention.

I think I got it.

Here is a version of the patch that does *not* freeze more tuples than
normal, except if a prior tuple on the same page is already eligible for 
freezing.

lazy_check_needs_freeze() is only called for an aggressive vacuum, which
this isn't.

Does that look sane?

Yours,
Laurenz Albe
From abf3c092e016bbf19059fc104669e92a8de18462 Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Tue, 17 Mar 2020 01:02:56 +0100
Subject: [PATCH] Autovacuum tables that have received only inserts

Add "autovacuum_vacuum_insert_threshold" and
"autovacuum_vacuum_insert_scale_factor" GUC and reloption.
The default value for the threshold is 1000.
The scale factor defaults to 0, which means that it is
effectively disabled, but it offers some flexibility
to tune the feature similar to other autovacuum knobs.

Any table that has received more inserts since it was
last vacuumed (and that is not vacuumed for another
reason) will be autovacuumed.
During such a vacuum run, freeze all tuples in a page
that has already been dirtied for any other reason.
This should cause little extra overhead.

This avoids the known problem that insert-only tables
are never autovacuumed until they need to have their
anti-wraparound autovacuum, which then can be massive
and disruptive.

To track the number of inserts since the last vacuum,
introduce a StatTabEntry "inserts_since_vacuum" that
gets reset to 0 after a vacuum.  This value is available
in "pg_stat_*_tables" as "n_ins_since_vacuum".

Author: Laurenz Albe, based on a suggestion from Darafei Praliaskouski
Reviewed-by: David Rowley, Justin Pryzby, Masahiko Sawada, Andres Freund
Discussion: https://postgr.es/m/CAC8Q8t+j36G_bLF=+0imo6jgnwnlnwb1tujxujr-+x8zcct...@mail.gmail.com
---
 doc/src/sgml/config.sgml  | 41 ++
 doc/src/sgml/maintenance.sgml |  9 
 doc/src/sgml/monitoring.sgml  |  5 ++
 doc/src/sgml/ref/create_table.sgml| 30 +++
 src/backend/access/common/reloptions.c| 22 
 src/backend/access/heap/vacuumlazy.c  | 11 +++-
 src/backend/catalog/system_views.sql  |  1 +
 src/backend/commands/vacuum.c |  3 ++
 src/backend/postmaster/autovacuum.c   | 53 ---
 src/backend/postmaster/pgstat.c   |  5 ++
 src/backend/utils/adt/pgstatfuncs.c   | 16 ++
 src/backend/utils/misc/guc.c  | 20 +++
 src/backend/utils/misc/postgresql.conf.sample |  4 ++
 src/bin/psql/tab-complete.c   |  4 ++
 src/include/catalog/pg_proc.dat   |  5 ++
 src/include/commands/vacuum.h |  1 +
 src/include/pgstat.h  |  1 +
 src/include/postmaster/autovacuum.h   |  2 +
 src/include/utils/rel.h   |  2 +
 src/test/regress/expected/rules.out   |  3 ++
 20 files changed, 230 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index c1128f89ec..0ed1bb9d5e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -7244,6 +7244,26 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
  
 
+ 
+  autovacuum_vacuum_insert_threshold (integer)
+  
+   autovacuum_vacuum_insert_threshold
+   configuration parameter
+  
+  
+  
+   
+Specifies the number of inserted tuples needed to trigger a
+VACUUM in any one table.
+The default is 1000 tuples.
+This parameter can only be set in the postgresql.conf
+file or on the server command line;
+but the setting can be overridden for individual tables by
+changing table storage parameters.
+   
+  
+ 
+
  
   autovacuum_analyze_threshold (integer)
   
@@ -7285,6 +7305,27 @@ COPY postgr

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-16 Thread Andres Freund

Hi,

On 2020-03-16 22:25:11 +0100, Laurenz Albe wrote:
> On Mon, 2020-03-16 at 13:13 -0700, Andres Freund wrote:
> > > Freezing tuples is the point of this patch.
> > 
> > Sure. But not hurting existing installation is also a goal of the
> > patch. Since this is introducing potentially significant performance
> > downsides, I think it's good to be a bit conservative with the default
> > configuration.
> > 
> > I'm gettin a bit more bullish on implementing some of what what I
> > discussed in
> > https://www.postgresql.org/message-id/20200313213851.ejrk5gptnmp65uoo%40alap3.anarazel.de
> > at the same time as this patch.
> >
> > In particularl, I think it'd make sense to *not* have a lower freezing
> > horizon for insert vacuums (because it *will* cause problems), but if
> > the page is dirty anyway, then do the freezing even if freeze_min_age
> > etc would otherwise prevent us from doing so?
> 
> I don't quite see why freezing tuples in insert-only tables will cause
> problems - are you saying that more WAL will be written compared to
> freezing with a higher freeze_min_age?

As far as I understand the patch may trigger additional vacuums e.g. for
tables that have some heavily updated parts / key ranges, and otherwise
are largely insert only (as long as there are in total considerably more
inserts than updates). That's not at all uncommon.

And for the heavily updated regions the additional vacuums with a 0 min
age could prove to be costly.  I've not looked at the new code, but it'd
be particularly bad if the changes were to trigger the
lazy_check_needs_freeze() check in lazy_scan_heap() - it'd have the
potential for a lot more contention.

> > > As I have said, if you have a table where you insert many rows in few
> > > transactions, you would trigger an autovacuum that then ends up doing 
> > > nothing
> > > because none of the rows have reached vacuum_freeze_table_age yet.
> > > Then some time later you will get a really large vacuum run.
> > 
> > Well, only if you don't further insert into the table. Which isn't that
> > common a case for a table having a "really large vacuum run".
> 
> Ah, yes, you are right.
> So it actually would not be worse if we use the normal freeze_min_age
> for insert-only vacuums.

Well, it's still be worse, because it'd likely trigger more writes of
the same pages. Once for setting hint bits during the first vacuum, and
then later a second for freezing. Which is why I was pondering using the
logic

> So do you think the patch would be ok as it is if we change only that?

I've not looked at it in enough detail so far to say either way, sorry.

Greetings,

Andres Freund

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-16 Thread Laurenz Albe

On Mon, 2020-03-16 at 16:07 -0500, Justin Pryzby wrote:
> Best practice is to vacuum following bulk load.

Yes.

> If it's a bulk load, then I think it's okay to assume it was vacuumed,

No.  This patch is there precisely because too many people don't know
that they should vacuum their table after a bulk insert.
The idea of autovacuum is to do these things for you atomatically.

Yours,
Laurenz Albe

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-16 Thread Laurenz Albe

On Mon, 2020-03-16 at 13:13 -0700, Andres Freund wrote:
> > Freezing tuples is the point of this patch.
> 
> Sure. But not hurting existing installation is also a goal of the
> patch. Since this is introducing potentially significant performance
> downsides, I think it's good to be a bit conservative with the default
> configuration.
> 
> I'm gettin a bit more bullish on implementing some of what what I
> discussed in
> https://www.postgresql.org/message-id/20200313213851.ejrk5gptnmp65uoo%40alap3.anarazel.de
> at the same time as this patch.
>
> In particularl, I think it'd make sense to *not* have a lower freezing
> horizon for insert vacuums (because it *will* cause problems), but if
> the page is dirty anyway, then do the freezing even if freeze_min_age
> etc would otherwise prevent us from doing so?

I don't quite see why freezing tuples in insert-only tables will cause
problems - are you saying that more WAL will be written compared to
freezing with a higher freeze_min_age?

> > As I have said, if you have a table where you insert many rows in few
> > transactions, you would trigger an autovacuum that then ends up doing 
> > nothing
> > because none of the rows have reached vacuum_freeze_table_age yet.
> > Then some time later you will get a really large vacuum run.
> 
> Well, only if you don't further insert into the table. Which isn't that
> common a case for a table having a "really large vacuum run".

Ah, yes, you are right.
So it actually would not be worse if we use the normal freeze_min_age
for insert-only vacuums.

So do you think the patch would be ok as it is if we change only that?

Yours,
Laurenz Albe

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-16 Thread Justin Pryzby

On Mon, Mar 16, 2020 at 08:49:43PM +0100, Laurenz Albe wrote:
> On Mon, 2020-03-16 at 07:47 -0500, Justin Pryzby wrote:
> > It seems to me that the easy thing to do is to implement this initially 
> > without
> > FREEZE (which is controlled by vacuum_freeze_table_age), and defer until
> > July/v14 further discussion and implementation of another GUC/relopt for
> > autovacuum freezing to be controlled by insert thresholds (or ratio).
> 
> Freezing tuples is the point of this patch.
> As I have said, if you have a table where you insert many rows in few
> transactions, you would trigger an autovacuum that then ends up doing nothing
> because none of the rows have reached vacuum_freeze_table_age yet.
> 
> Then some time later you will get a really large vacuum run.

Best practice is to vacuum following bulk load.  I don't think this patch is
going to change that.  Bulk-loaded tuples will be autovacuumed, which is nice,
but I don't think it'll be ideal if large bulk loads trigger an autovacuum with
cost delays which ISTM if it runs with FREEZE will take even longer.

If it's a bulk load, then I think it's okay to assume it was vacuumed, or
otherwise that it'll eventually be hit by autovac at some later date.

If it's not a "bulk load" but a normal runtime, and the table continues to
receive inserts/deletes, then eventually it'll hit a vacuum threshold and
tuples can be frozen.

If it receives a bunch of activity, which then stops (like a partition of a
table of timeseries data), then maybe it doesn't hit a vacuum threshold, until
wraparound vacuum.  I think in that case it's not catastrophic, since then it
wasn't big enough to hit any threshold (it's partitioned).  If every day,
autovacuum kicks in and does wraparound vacuum on table with data from (say)
100 days ago, I think that's reasonable.

One case which would suck is if the insert_threshold were 1e6, and you restore
a DB with 1000 tables of historic data (which are no longer being inserted
into) which have 9e5 rows each (just below the threshold).  Then autovacuum
will hit them all at once.  The solution to that is to manual vacuum after bulk
load, same as today.  As a practical matter, some of the tables are likely to
hit the autovacuum insert threshold, and some are likely to be pruned (or
updated) before wraparound vacuum, so the patch usually does improve that case.

-- 
Justin

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-16 Thread Andres Freund

Hi,

On 2020-03-16 20:49:43 +0100, Laurenz Albe wrote:
> On Mon, 2020-03-16 at 07:47 -0500, Justin Pryzby wrote:
> > It seems to me that the easy thing to do is to implement this initially 
> > without
> > FREEZE (which is controlled by vacuum_freeze_table_age), and defer until
> > July/v14 further discussion and implementation of another GUC/relopt for
> > autovacuum freezing to be controlled by insert thresholds (or ratio).
> 
> Freezing tuples is the point of this patch.

Sure. But not hurting existing installation is also a goal of the
patch. Since this is introducing potentially significant performance
downsides, I think it's good to be a bit conservative with the default
configuration.

I'm gettin a bit more bullish on implementing some of what what I
discussed in
https://www.postgresql.org/message-id/20200313213851.ejrk5gptnmp65uoo%40alap3.anarazel.de
at the same time as this patch.

In particularl, I think it'd make sense to *not* have a lower freezing
horizon for insert vacuums (because it *will* cause problems), but if
the page is dirty anyway, then do the freezing even if freeze_min_age
etc would otherwise prevent us from doing so?

It'd probably be ok to incur the WAL logging overhead unconditionally,
but I'm not sure about it.

> As I have said, if you have a table where you insert many rows in few
> transactions, you would trigger an autovacuum that then ends up doing nothing
> because none of the rows have reached vacuum_freeze_table_age yet.

> Then some time later you will get a really large vacuum run.

Well, only if you don't further insert into the table. Which isn't that
common a case for a table having a "really large vacuum run".

Greetings,

Andres Freund

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-16 Thread Laurenz Albe

On Mon, 2020-03-16 at 07:47 -0500, Justin Pryzby wrote:
> It seems to me that the easy thing to do is to implement this initially 
> without
> FREEZE (which is controlled by vacuum_freeze_table_age), and defer until
> July/v14 further discussion and implementation of another GUC/relopt for
> autovacuum freezing to be controlled by insert thresholds (or ratio).

Freezing tuples is the point of this patch.
As I have said, if you have a table where you insert many rows in few
transactions, you would trigger an autovacuum that then ends up doing nothing
because none of the rows have reached vacuum_freeze_table_age yet.

Then some time later you will get a really large vacuum run.

It seems to me that if we keep trying finding the formula that will vacuum
every table just right and never so the wrong thing, we will never get to 
anything.

Yours,
Laurenz Albe

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-16 Thread Andres Freund

Hi,

On 2020-03-13 19:10:00 -0500, Justin Pryzby wrote:
> On Fri, Mar 13, 2020 at 02:38:51PM -0700, Andres Freund wrote:
> > > |One disadvantage of decreasing vacuum_freeze_min_age is that it might 
> > > cause
> > > |VACUUM to do useless work: freezing a row version is a waste of time if 
> > > the row
> > > |is modified soon thereafter (causing it to acquire a new XID). So the 
> > > setting
> > > |should be large enough that rows are not frozen until they are unlikely 
> > > to
> > > |change any more.
> > 
> > I think the overhead here might be a bit overstated. Once a page is
> 
> Could you clarify if you mean the language in docs in general or specifically
> in the context of this patch ?

In the docs.

Greetings,

Andres Freund

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-16 Thread Justin Pryzby

On Mon, Mar 16, 2020 at 12:53:43PM +0900, Masahiko Sawada wrote:

> There is already a consensus on introducing new 2 parameters, but as
> the second idea I'd like to add one (or two) GUC(s) to my suggestion,
> say autovacuum_vacuum_freeze_insert_ratio; this parameter is the ratio
> of the number of inserted tuples for total number of tuples modified
> and inserted, in order to trigger insert-only vacuum. For example,
> suppose the table has 1,000,000 tuples and we set threshold = 0,
> scale_factor = 0.2 and freeze_insert_ratio = 0.9, we will trigger
> normal autovacuum when n_dead_tup + n_ins_since_vacuum > 200,000, but
> we will instead trigger insert-only autovacuum, which is a vacuum with
> vacuum_freeze_min_age = 0, when n_ins_since_vacuum > 180,000 (=200,000
> * 0.9). IOW if 90% of modified tuples are insertions, we freeze tuples
> aggressively. If we want to trigger insert-only vacuum only on
> insert-only table we can set freeze_insert_ratio = 1.0. The down side
> of this idea is that we cannot disable autovacuum triggered by the
> number of inserted, although we might be able to introduce more one
> GUC that controls whether to include the number of inserted tuples for
> triggering autovacuum (say, autovacuum_vacuum_triggered_by_insert =
> on|off). The pros of this idea would be that we can ensure that
> insert-only vacuum will run only in the case where the ratio of
> insertion is large enough.

I was thinking about something like this myself.  I would appreciate keeping
separate the thresholds for 1) triggering vacuum; and, 2) the options
autovacuum uses when it runs (in this case, FREEZE).  Someone might want
autovacuum to run with FREEZE on a table vacuumed due to dead tuples (say, on a
partitioned table), or might *not* want to run FREEZE on a table vacuumed due
to insertions (maybe because index scans are too expensive or FREEZE makes it
too slow).

Normally, when someone complains about bad plan related to no index-onlyscan,
we tell them to run vacuum, and if that helps, then ALTER TABLE .. SET
(autovacuum_vacuum_scale_factor=0.005).

If there's two thresholds (4 GUCs and 4 relopts) for autovacuum, then do we
have to help determine which one was being hit, and which relopt to set?

I wonder if the new insert GUCs should default to -1 (disabled)?  And the
insert thresholds should be set by new insert relopt (if set), or by new insert
GUC (default -1), else normal relopt, or normal GUC.  The defaults would give
50 + 0.20*n.  When someone asks about IOS, we'd tell them to set
autovacuum_vacuum_scale_factor=0.005, same as now.

vac_ins_scale_factor =
(relopts && relopts->vacuum_ins_scale_factor >= 0) ? 
relopts->vacuum_ins_scale_factor :
autovacuum_vac_ins_scale >= 0 ? autovacuum_vac_ins_scale : 
(relopts && relopts->vacuum_scale_factor >= 0) ? 
relopts->vacuum_scale_factor :
autovacuum_vac_scale;

One would disable autovacuum triggered by insertions by setting
autovacuum_vacuum_insert_scale_factor=1e10 (which I think should also be the
max for this patch).

It seems to me that the easy thing to do is to implement this initially without
FREEZE (which is controlled by vacuum_freeze_table_age), and defer until
July/v14 further discussion and implementation of another GUC/relopt for
autovacuum freezing to be controlled by insert thresholds (or ratio).

-- 
Justin

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-16 Thread Laurenz Albe

On Mon, 2020-03-16 at 12:53 +0900, Masahiko Sawada wrote:
> There is already a consensus on introducing new 2 parameters, but as
> the second idea I'd like to add one (or two) GUC(s) to my suggestion,
> say autovacuum_vacuum_freeze_insert_ratio; this parameter is the ratio
> of the number of inserted tuples for total number of tuples modified
> and inserted, in order to trigger insert-only vacuum. For example,
> suppose the table has 1,000,000 tuples and we set threshold = 0,
> scale_factor = 0.2 and freeze_insert_ratio = 0.9, we will trigger
> normal autovacuum when n_dead_tup + n_ins_since_vacuum > 200,000, but
> we will instead trigger insert-only autovacuum, which is a vacuum with
> vacuum_freeze_min_age = 0, when n_ins_since_vacuum > 180,000 (=200,000
> * 0.9). IOW if 90% of modified tuples are insertions, we freeze tuples
> aggressively. If we want to trigger insert-only vacuum only on
> insert-only table we can set freeze_insert_ratio = 1.0. The down side
> of this idea is that we cannot disable autovacuum triggered by the
> number of inserted, although we might be able to introduce more one
> GUC that controls whether to include the number of inserted tuples for
> triggering autovacuum (say, autovacuum_vacuum_triggered_by_insert =
> on|off). The pros of this idea would be that we can ensure that
> insert-only vacuum will run only in the case where the ratio of
> insertion is large enough.

Two more parameters :^(  But your reasoning is good.

How about we go with what we have now and leave that for future
discussion and patches?

Yours,
Laurenz Albe

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-15 Thread Justin Pryzby

On Fri, Mar 13, 2020 at 10:48:27PM +0100, Laurenz Albe wrote:
> On Fri, 2020-03-13 at 13:44 -0500, Justin Pryzby wrote:
> > Possible it would be better to run VACUUM *without* freeze_min_age=0 ?  (I 
> > get
> > confused and have to spend 20min re-reading the vacuum GUC docs every time I
> > deal with this stuff, so maybe I'm off).
> > 
> > As I understand, the initial motivation of this patch was to avoid 
> > disruptive
> > anti-wraparound vacuums on insert-only table.  But if vacuum were triggered 
> > at
> > all, it would freeze the oldest tuples, which is all that's needed; 
> > especially
> > since fd31cd2651 "Don't vacuum all-frozen pages.", those pages would never 
> > need
> > to be vacuumed again.  Recently written tuples wouldn't be frozen, which is 
> > ok,
> > they're handled next time.
> 
> Freezing tuples too early is wasteful if the tuples get updated or deleted
> soon after, but based on the assumption that an autovacuum triggered by insert
> is dealing with an insert-mostly table, it is not that wasteful.

You're right that it's not *that* wasteful.  If it's a table that gets 90%
inserts/10% updates, then only 10% of its tuples will be frozen.  In the worst
case, it's the same tuples every time, and that's somewhat wasteful.  In the
best case, those tuples are clustered on a small number of pages.

-- 
Justin

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-15 Thread Masahiko Sawada

On Fri, 13 Mar 2020 at 05:11, David Rowley  wrote:
>
> On Fri, 13 Mar 2020 at 01:43, Masahiko Sawada
>  wrote:
> >
> > On Thu, 12 Mar 2020 at 16:28, David Rowley  wrote:
> > > Laurenz highlighted a seemingly very valid reason that the current
> > > GUCs cannot be reused. Namely, say the table has 1 billion rows, if we
> > > use the current scale factor of 0.2, then we'll run an insert-only
> > > vacuum every 200 million rows. If those INSERTs are one per
> > > transaction then the new feature does nothing as the wraparound vacuum
> > > will run instead. Since this feature was born due to large insert-only
> > > tables, this concern seems very valid to me.
> >
> > Yeah, I understand and agree that since most people would use default
> > values we can reduce mis-configuration cases by adding separate GUCs
> > that have appropriate default values for that purpose but on the other
> > hand I'm not sure it's worth that we cover the large insert-only table
> > case by adding separate GUCs in spite of being able to cover it even
> > by existing two GUCs.
>
> In light of the case above, do you have an alternative suggestion?
>
> > If we want to disable this feature on the
> > particular table, we can have a storage parameter that means not to
> > consider the number of inserted tuples rather than having multiple
> > GUCs that allows us to fine tuning. And IIUC even in the above case, I
> > think that if we trigger insert-only vacuum by comparing the number of
> > inserted tuples to the threshold computed by existing threshold and
> > scale factor, we can cover it.
>
> So you're suggesting we drive the insert-vacuums from existing
> scale_factor and threshold?  What about the 1 billion row table
> example above?

My suggestion is the initial approach proposed by Justin; comparing
the number of inserted tuples to the threshold computed by
autovacuum_vacum_threshold and autovacuum_vacuum_scale_factor in order
to trigger autovacuum. But as discussed, there is a downside; if the
number of inserted tuples are almost the same as, but a little larger
than, the number of dead tuples, we will trigger insert-only vacuum
but it's wasteful.

There is already a consensus on introducing new 2 parameters, but as
the second idea I'd like to add one (or two) GUC(s) to my suggestion,
say autovacuum_vacuum_freeze_insert_ratio; this parameter is the ratio
of the number of inserted tuples for total number of tuples modified
and inserted, in order to trigger insert-only vacuum. For example,
suppose the table has 1,000,000 tuples and we set threshold = 0,
scale_factor = 0.2 and freeze_insert_ratio = 0.9, we will trigger
normal autovacuum when n_dead_tup + n_ins_since_vacuum > 200,000, but
we will instead trigger insert-only autovacuum, which is a vacuum with
vacuum_freeze_min_age = 0, when n_ins_since_vacuum > 180,000 (=200,000
* 0.9). IOW if 90% of modified tuples are insertions, we freeze tuples
aggressively. If we want to trigger insert-only vacuum only on
insert-only table we can set freeze_insert_ratio = 1.0. The down side
of this idea is that we cannot disable autovacuum triggered by the
number of inserted, although we might be able to introduce more one
GUC that controls whether to include the number of inserted tuples for
triggering autovacuum (say, autovacuum_vacuum_triggered_by_insert =
on|off). The pros of this idea would be that we can ensure that
insert-only vacuum will run only in the case where the ratio of
insertion is large enough.

Regards,

--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-15 Thread Justin Pryzby

On Fri, Mar 13, 2020 at 02:38:51PM -0700, Andres Freund wrote:
> > Having now played with the patch, I'll suggest that 1000 is too high a
> > threshold.  If autovacuum runs without FREEZE, I don't see why it couldn't 
> > be
> > much lower (10?) or use (0.2 * n_ins + 50) like the other autovacuum 
> > GUC.
> 
> ISTM that the danger of regressing workloads due to suddenly repeatedly
> scanning huge indexes that previously were never / rarely scanned is
> significant

You're right - at one point, I was going to argue to skip index cleanup, and I
think wrote that before I finished convincing myself why it wasn't ok to skip.

-- 
Justin

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-13 Thread Justin Pryzby

On Fri, Mar 13, 2020 at 02:38:51PM -0700, Andres Freund wrote:
> > |One disadvantage of decreasing vacuum_freeze_min_age is that it might cause
> > |VACUUM to do useless work: freezing a row version is a waste of time if 
> > the row
> > |is modified soon thereafter (causing it to acquire a new XID). So the 
> > setting
> > |should be large enough that rows are not frozen until they are unlikely to
> > |change any more.
> 
> I think the overhead here might be a bit overstated. Once a page is

Could you clarify if you mean the language in docs in general or specifically
in the context of this patch ?

-- 
Justin

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-13 Thread Laurenz Albe

On Fri, 2020-03-13 at 13:44 -0500, Justin Pryzby wrote:
> Possible it would be better to run VACUUM *without* freeze_min_age=0 ?  (I get
> confused and have to spend 20min re-reading the vacuum GUC docs every time I
> deal with this stuff, so maybe I'm off).
> 
> As I understand, the initial motivation of this patch was to avoid disruptive
> anti-wraparound vacuums on insert-only table.  But if vacuum were triggered at
> all, it would freeze the oldest tuples, which is all that's needed; especially
> since fd31cd2651 "Don't vacuum all-frozen pages.", those pages would never 
> need
> to be vacuumed again.  Recently written tuples wouldn't be frozen, which is 
> ok,
> they're handled next time.

Freezing tuples too early is wasteful if the tuples get updated or deleted
soon after, but based on the assumption that an autovacuum triggered by insert
is dealing with an insert-mostly table, it is not that wasteful.

If we didn't freeze all tuples, it is easy to envision a situation where
bulk data loads load several million rows in a few transactions, which
would trigger a vacuum.  With the normal vacuum_freeze_min_age, that vacuum
would do nothing at all.  It is better if each vacuum freezes some rows,
in other words, if it does some of the anti-wraparound work.

> Another motivation of the patch is to allow indexonly scan, for which the
> planner looks at pages' "relallvisible" fraction (and at execution if a page
> isn't allvisible, visits the heap).  Again, that happens if vacuum were run at
> all.  Again, some pages won't be marked allvisible, which is fine, they're
> handled next time.

Yes, freezing is irrelevant with respect to index only scans, but it helps
with mitigating the impact of anti-wraparound vacuum runs.

> I think freeze_min_age=0 could negatively affect people who have insert-mostly
> tables (I'm not concerned, but that includes us).  If they consistently hit 
> the
> autovacuum insert threshold before the cleanup threshold for updated/deleted
> tuples, any updated/deleted tuples would be frozen, which would be
> wasteful:  

I don't get that.  Surely tuples whose xmax is committed won't be frozen.

> So my question is if autovacuum triggered by insert threshold should trigger
> VACUUM with the same settings as a vacuum due to deleted tuples.  I realize 
> the
> DBA could just configure the thresholds so they'd hit vacuum for cleaning dead
> tuples, so my suggestion maybe just improves the case with the default
> settings.  It's possible to set the reloption autovacuum_freeze_min_age, which
> I think supports the idea of running a vacuum normally and letting it (and the
> DBA) decide what do with existing logic.

Yes, the DBA can explicitly set vacuum_freeze_min_age to 0.

But for one DBA who understands his or her workload well enough, and who knows
the workings of autovacuum well enough to do that kind of tuning, there are
99 DBAs who don't, and it is the goal of the patch (expressed in the subject)
to make things work for those people who go with the default.

And I believe that is better achieved with freezing as many tuples as possible.

> Also, there was a discussion about index cleanup with the conclusion that it
> was safer not to skip it, since otherwise indexes might bloat.  I think that's
> right, since vacuum for cleanup is triggered by the number of dead heap 
> tuples.
> To skip index cleanup, I think you'd want a metric for
> n_dead_since_index_cleanup.  (Or maybe analyze could track dead index tuples
> and trigger vacuum of each index separately).

Yes, I think we pretty much all agree on that.

> Having now played with the patch, I'll suggest that 1000 is too high a
> threshold.  If autovacuum runs without FREEZE, I don't see why it couldn't be
> much lower (10?) or use (0.2 * n_ins + 50) like the other autovacuum GUC.

There is the concern that that might treat large table to seldom.

I am curious - what were the findings that led you to think that 1000
is too high?

Yours,
Laurenz Albe

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-13 Thread Andres Freund

Hi,

On 2020-03-13 13:44:42 -0500, Justin Pryzby wrote:
> As I understand, the initial motivation of this patch was to avoid disruptive
> anti-wraparound vacuums on insert-only table.  But if vacuum were triggered at
> all, it would freeze the oldest tuples, which is all that's needed; especially
> since fd31cd2651 "Don't vacuum all-frozen pages.", those pages would never 
> need
> to be vacuumed again.  Recently written tuples wouldn't be frozen, which is 
> ok,
> they're handled next time.
> 
> Another motivation of the patch is to allow indexonly scan, for which the
> planner looks at pages' "relallvisible" fraction (and at execution if a page
> isn't allvisible, visits the heap).  Again, that happens if vacuum were run at
> all.  Again, some pages won't be marked allvisible, which is fine, they're
> handled next time.
> 
> I think freeze_min_age=0 could negatively affect people who have insert-mostly
> tables (I'm not concerned, but that includes us).  If they consistently hit 
> the
> autovacuum insert threshold before the cleanup threshold for updated/deleted
> tuples, any updated/deleted tuples would be frozen, which would be
> wasteful:  

I think that's a valid concern.

> |One disadvantage of decreasing vacuum_freeze_min_age is that it might cause
> |VACUUM to do useless work: freezing a row version is a waste of time if the 
> row
> |is modified soon thereafter (causing it to acquire a new XID). So the setting
> |should be large enough that rows are not frozen until they are unlikely to
> |change any more.

I think the overhead here might be a bit overstated. Once a page is
dirtied (or already dirty) during vacuum, and we freeze a single row
(necessating WAL logging), there's not really a good reason to not also
freeze the rest of the row on that page. The added cost for freezing
another row is miniscule compared to the "constant" cost of freezing
anything on the page.  It's of course different if there are otherwise
no tuples worth freezing on the page (not uncommon). But there's really
no reason for that to be the case:

Afaict the only problem with more aggressively freezing when we touch
(beyond hint bits) the page anyway is that we commonly end up with
multiple WAL records for the same page:

1) lazy_scan_heap()->heap_page_prune() will log a XLOG_HEAP2_CLEAN record, but 
leave
   itemids in place most of the time
2) lazy_scan_heap()->log_heap_freeze() will log a XLOG_HEAP2_FREEZE_PAGE record
3a) if no indexes exist/index cleanup is disabled:
  lazy_vacuum_page()->lazy_vacuum_page() will log a XLOG_HEAP2_CLEAN
  record, removing dead tuples (including itemids)
3b) if indexes need to be cleaned up,
  lazy_vacuum_heap()->lazy_vacuum_page() will log a XLOG_HEAP2_CLEAN

which is not nice. It likely is worth merging xl_heap_freeze_page into
xl_heap_clean, and having heap pruning always freeze once it decides to
dirty a page.

We could probably always prune dead tuples as part of heap_prune_chain()
if there's no indexes - but I'm doubtful it's worth it, since there'll
be few tables with lots of dead tuples that don't have indexes.

Merging 3b's WAL record would be harder, I think.

There's also a significant source of additional WAL records here, one
that I think should really not have been introduced:

4) HeapTupleSatisfiesVacuum() called both by heap_prune_chain(), and
  lazy_scan_heap() will often trigger a WAL record when the checksums or
  wal_log_hint_bits are enabled. If the page hasn't been modified in the
  current checkpoint window (extremely common for VACUUM, reasonably
  common for opportunistic pruning), we will log a full page write.

  Imo this really should have been avoided when checksums were added,
  that's a pretty substantial and unnecessary increase in overhead.

It's probably overkill to tie fixing the 'insert only' case to improving
the WAL logging for vacuuming / pruning. But it'd certainly would
largely remove the tradeoff discussed here, by removing additional
overhead of freezing in tables that are also updated.

> Also, there was a discussion about index cleanup with the conclusion that it
> was safer not to skip it, since otherwise indexes might bloat.  I think that's
> right, since vacuum for cleanup is triggered by the number of dead heap 
> tuples.
> To skip index cleanup, I think you'd want a metric for
> n_dead_since_index_cleanup.  (Or maybe analyze could track dead index tuples
> and trigger vacuum of each index separately).
> 
> Having now played with the patch, I'll suggest that 1000 is too high a
> threshold.  If autovacuum runs without FREEZE, I don't see why it couldn't be
> much lower (10?) or use (0.2 * n_ins + 50) like the other autovacuum GUC.

ISTM that the danger of regressing workloads due to suddenly repeatedly
scanning huge indexes that previously were never / rarely scanned is
significant (if there's a few dead tuples, otherwise most indexes will
be able to skip the scan since the vacuum_cleanup_index_scale_factor
introduction)).

Gr

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-13 Thread Justin Pryzby

On Tue, Mar 10, 2020 at 01:53:42PM +1300, David Rowley wrote:
> 2. Perhaps the documentation in maintenance.sgml should mention that
> the table will be vacuumed with the equivalent of having
> vacuum_freeze_min_age = 0, instead of:
> 
> "Such a vacuum will aggressively freeze tuples."
> 
> aggressive is the wrong word here. We call it an aggressive vacuum if
> we disable page skipping, not for setting the vacuum_freeze_min_age to
> 0.

Possible it would be better to run VACUUM *without* freeze_min_age=0 ?  (I get
confused and have to spend 20min re-reading the vacuum GUC docs every time I
deal with this stuff, so maybe I'm off).

As I understand, the initial motivation of this patch was to avoid disruptive
anti-wraparound vacuums on insert-only table.  But if vacuum were triggered at
all, it would freeze the oldest tuples, which is all that's needed; especially
since fd31cd2651 "Don't vacuum all-frozen pages.", those pages would never need
to be vacuumed again.  Recently written tuples wouldn't be frozen, which is ok,
they're handled next time.

Another motivation of the patch is to allow indexonly scan, for which the
planner looks at pages' "relallvisible" fraction (and at execution if a page
isn't allvisible, visits the heap).  Again, that happens if vacuum were run at
all.  Again, some pages won't be marked allvisible, which is fine, they're
handled next time.

I think freeze_min_age=0 could negatively affect people who have insert-mostly
tables (I'm not concerned, but that includes us).  If they consistently hit the
autovacuum insert threshold before the cleanup threshold for updated/deleted
tuples, any updated/deleted tuples would be frozen, which would be
wasteful:  

|One disadvantage of decreasing vacuum_freeze_min_age is that it might cause
|VACUUM to do useless work: freezing a row version is a waste of time if the row
|is modified soon thereafter (causing it to acquire a new XID). So the setting
|should be large enough that rows are not frozen until they are unlikely to
|change any more.

So my question is if autovacuum triggered by insert threshold should trigger
VACUUM with the same settings as a vacuum due to deleted tuples.  I realize the
DBA could just configure the thresholds so they'd hit vacuum for cleaning dead
tuples, so my suggestion maybe just improves the case with the default
settings.  It's possible to set the reloption autovacuum_freeze_min_age, which
I think supports the idea of running a vacuum normally and letting it (and the
DBA) decide what do with existing logic.

Also, there was a discussion about index cleanup with the conclusion that it
was safer not to skip it, since otherwise indexes might bloat.  I think that's
right, since vacuum for cleanup is triggered by the number of dead heap tuples.
To skip index cleanup, I think you'd want a metric for
n_dead_since_index_cleanup.  (Or maybe analyze could track dead index tuples
and trigger vacuum of each index separately).

Having now played with the patch, I'll suggest that 1000 is too high a
threshold.  If autovacuum runs without FREEZE, I don't see why it couldn't be
much lower (10?) or use (0.2 * n_ins + 50) like the other autovacuum GUC.  

-- 
Justin

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-13 Thread Laurenz Albe

On Fri, 2020-03-13 at 07:00 -0500, Justin Pryzby wrote:
> > 2. The new feature can be completely disabled. This might be very
> > useful for people who suffer from auto-vacuum starvation.
> 
> > Yes, but in particular so it can be completely disabled easily.
> 
> How is it disabled ?  By setting scale_factor=100 ?
> 
> +   { 
>   
>
> +   "autovacuum_vacuum_insert_scale_factor",  
>   
>
> +   "Number of tuple inserts prior to vacuum as a 
> fraction of reltuples",   
>
> +   RELOPT_KIND_HEAP | RELOPT_KIND_TOAST, 
>   
>
> +   ShareUpdateExclusiveLock  
>   
>
> +   },
>   
>
> +   -1, 0.0, 100.0
>   
>
> 
> Note, vacuum_cleanup_index_scale_factor uses max: 1e10
> See 4d54543efa5eb074ead4d0fadb2af4161c943044

By setting the threshold very high, or by setting the scale factor to 100.

Yours,
Laurenz Albe

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-13 Thread Laurenz Albe

On Fri, 2020-03-13 at 12:05 +0300, Darafei "Komяpa" Praliaskouski wrote:
>  1. introduce no new parameters and trigger autovacuum if the number
> > of inserts exceeds the regular vacuum threshold.
> > 
> > 2. introduce the new parameters with high base threshold and zero scale 
> > factor.
> 
> Both of these look good to me. 1 is approach in my initial patch
> sketch, 2 is approach taken by Laurenz.
> Values I think in when considering vacuum is "how many megabytes of
> table aren't frozen/visible" (since that's what translates into
> processing time knowing io limits of storage), and "how many pages
> aren't yet vacuumed".
> 
> Threshold in Laurenz's patch was good enough for my taste - it's
> basically "vacuum after every gigabyte", and that's exactly what we
> implemented when working around this issue manually. There's enough
> chance that latest gigabyte is in RAM and vacuum will be super fast on
> it; reading a gigabyte of data is not a showstopper for most
> contemporary physical and cloud environments I can think of. If
> reading a gigabyte is a problem already then wraparound is a
> guaranteed disaster.
> 
> About index only scan, this threshold seems good enough too. There's a
> good chance last gig is already in RAM, and previous data was
> processed with previous vacuum. Anyway - with this patch Index Only
> Scan starts actually working :)
> 
> I'd vote for 2 with a note "rip it off all together later and redesign
> scale factors and thresholds system to something more easily
> graspable". Whoever needs to cancel the new behavior for some reason
> will have a knob then, and patch is laid out already.
> 
> > 3. introduce the new parameters with low base threshold and high scale 
> > factor.
> 
> This looks bad to me. "the bigger the table, the longer we wait" does
> not look good for me for something designed as a measure preventing
> issues with big tables.

Thanks for the feedback.

It looks like we have a loose consensus on #2, i.e. my patch.

Yours,
Laurenz Albe

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-13 Thread Justin Pryzby

On Wed, Mar 11, 2020 at 10:32:47AM +1300, David Rowley wrote:
> 2. The new feature can be completely disabled. This might be very
> useful for people who suffer from auto-vacuum starvation.

On Thu, Mar 12, 2020 at 08:28:05PM +1300, David Rowley wrote:
> Yes, but in particular so it can be completely disabled easily.

How is it disabled ?  By setting scale_factor=100 ?

+   {   

   
+   "autovacuum_vacuum_insert_scale_factor",

   
+   "Number of tuple inserts prior to vacuum as a fraction 
of reltuples",  

+   RELOPT_KIND_HEAP | RELOPT_KIND_TOAST,   

   
+   ShareUpdateExclusiveLock

   
+   },  

   
+   -1, 0.0, 100.0  

   

Note, vacuum_cleanup_index_scale_factor uses max: 1e10
See 4d54543efa5eb074ead4d0fadb2af4161c943044

-- 
Justin

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-13 Thread Komяpa

On Fri, Mar 13, 2020 at 3:19 AM Laurenz Albe  wrote:
>
> On Fri, 2020-03-13 at 09:10 +1300, David Rowley wrote:
> > So you're suggesting we drive the insert-vacuums from existing
> > scale_factor and threshold?  What about the 1 billion row table
> > example above?
>
> I am still not 100% certain if that is really realistic.
> Transactions that insert only a single row are probably the
> exception in large insert-only tables.
>
> But I think that we probably always can find a case where any given
> parameter setting is not so great, so in order to get ahead
> let's decide on something that is not right out stupid.
> Changing the defaults later is always an option.
>
> So the three options are:
>
> 1. introduce no new parameters and trigger autovacuum if the number
>of inserts exceeds the regular vacuum threshold.
>
> 2. introduce the new parameters with high base threshold and zero scale 
> factor.

Both of these look good to me. 1 is approach in my initial patch
sketch, 2 is approach taken by Laurenz.
Values I think in when considering vacuum is "how many megabytes of
table aren't frozen/visible" (since that's what translates into
processing time knowing io limits of storage), and "how many pages
aren't yet vacuumed".

Threshold in Laurenz's patch was good enough for my taste - it's
basically "vacuum after every gigabyte", and that's exactly what we
implemented when working around this issue manually. There's enough
chance that latest gigabyte is in RAM and vacuum will be super fast on
it; reading a gigabyte of data is not a showstopper for most
contemporary physical and cloud environments I can think of. If
reading a gigabyte is a problem already then wraparound is a
guaranteed disaster.

About index only scan, this threshold seems good enough too. There's a
good chance last gig is already in RAM, and previous data was
processed with previous vacuum. Anyway - with this patch Index Only
Scan starts actually working :)

I'd vote for 2 with a note "rip it off all together later and redesign
scale factors and thresholds system to something more easily
graspable". Whoever needs to cancel the new behavior for some reason
will have a knob then, and patch is laid out already.

> 3. introduce the new parameters with low base threshold and high scale factor.

This looks bad to me. "the bigger the table, the longer we wait" does
not look good for me for something designed as a measure preventing
issues with big tables.

> I think all three are viable.
> If nobody else wants to weigh in, throw a coin.
>
> Yours,
> Laurenz Albe
>

-- 
Darafei Praliaskouski
Support me: http://patreon.com/komzpa

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-12 Thread Laurenz Albe

On Fri, 2020-03-13 at 09:10 +1300, David Rowley wrote:
> So you're suggesting we drive the insert-vacuums from existing
> scale_factor and threshold?  What about the 1 billion row table
> example above?

I am still not 100% certain if that is really realistic.
Transactions that insert only a single row are probably the
exception in large insert-only tables.

But I think that we probably always can find a case where any given
parameter setting is not so great, so in order to get ahead
let's decide on something that is not right out stupid.
Changing the defaults later is always an option.

So the three options are:

1. introduce no new parameters and trigger autovacuum if the number
   of inserts exceeds the regular vacuum threshold.

2. introduce the new parameters with high base threshold and zero scale factor.

3. introduce the new parameters with low base threshold and high scale factor.

I think all three are viable.
If nobody else wants to weigh in, throw a coin.

Yours,
Laurenz Albe

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-12 Thread David Rowley

On Fri, 13 Mar 2020 at 01:43, Masahiko Sawada
 wrote:
>
> On Thu, 12 Mar 2020 at 16:28, David Rowley  wrote:
> > Laurenz highlighted a seemingly very valid reason that the current
> > GUCs cannot be reused. Namely, say the table has 1 billion rows, if we
> > use the current scale factor of 0.2, then we'll run an insert-only
> > vacuum every 200 million rows. If those INSERTs are one per
> > transaction then the new feature does nothing as the wraparound vacuum
> > will run instead. Since this feature was born due to large insert-only
> > tables, this concern seems very valid to me.
>
> Yeah, I understand and agree that since most people would use default
> values we can reduce mis-configuration cases by adding separate GUCs
> that have appropriate default values for that purpose but on the other
> hand I'm not sure it's worth that we cover the large insert-only table
> case by adding separate GUCs in spite of being able to cover it even
> by existing two GUCs.

In light of the case above, do you have an alternative suggestion?

> If we want to disable this feature on the
> particular table, we can have a storage parameter that means not to
> consider the number of inserted tuples rather than having multiple
> GUCs that allows us to fine tuning. And IIUC even in the above case, I
> think that if we trigger insert-only vacuum by comparing the number of
> inserted tuples to the threshold computed by existing threshold and
> scale factor, we can cover it.

So you're suggesting we drive the insert-vacuums from existing
scale_factor and threshold?  What about the 1 billion row table
example above?

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-12 Thread Masahiko Sawada

On Thu, 12 Mar 2020 at 16:28, David Rowley  wrote:
>
> On Thu, 12 Mar 2020 at 19:50, Masahiko Sawada
>  wrote:
> > The reason why you want to add new GUC parameters is to use different
> > default values for insert-update table case and insert-only table
> > case?
>
> Yes, but in particular so it can be completely disabled easily.
>
> > I think I understand the pros and cons of adding separate
> > parameters, but I still cannot understand use cases where we cannot
> > handle without separate parameters.
>
> That's a lot of negatives. I think I understand that you don't feel
> that additional GUCs are worth it?
>
> Laurenz highlighted a seemingly very valid reason that the current
> GUCs cannot be reused. Namely, say the table has 1 billion rows, if we
> use the current scale factor of 0.2, then we'll run an insert-only
> vacuum every 200 million rows. If those INSERTs are one per
> transaction then the new feature does nothing as the wraparound vacuum
> will run instead. Since this feature was born due to large insert-only
> tables, this concern seems very valid to me.

Yeah, I understand and agree that since most people would use default
values we can reduce mis-configuration cases by adding separate GUCs
that have appropriate default values for that purpose but on the other
hand I'm not sure it's worth that we cover the large insert-only table
case by adding separate GUCs in spite of being able to cover it even
by existing two GUCs. If we want to disable this feature on the
particular table, we can have a storage parameter that means not to
consider the number of inserted tuples rather than having multiple
GUCs that allows us to fine tuning. And IIUC even in the above case, I
think that if we trigger insert-only vacuum by comparing the number of
inserted tuples to the threshold computed by existing threshold and
scale factor, we can cover it. But since you and Laurenz already
agreed to adding two GUCs I'm not going to insist on that.

Regards,

--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-12 Thread David Rowley

On Thu, 12 Mar 2020 at 19:50, Masahiko Sawada
 wrote:
> The reason why you want to add new GUC parameters is to use different
> default values for insert-update table case and insert-only table
> case?

Yes, but in particular so it can be completely disabled easily.

> I think I understand the pros and cons of adding separate
> parameters, but I still cannot understand use cases where we cannot
> handle without separate parameters.

That's a lot of negatives. I think I understand that you don't feel
that additional GUCs are worth it?

Laurenz highlighted a seemingly very valid reason that the current
GUCs cannot be reused. Namely, say the table has 1 billion rows, if we
use the current scale factor of 0.2, then we'll run an insert-only
vacuum every 200 million rows. If those INSERTs are one per
transaction then the new feature does nothing as the wraparound vacuum
will run instead. Since this feature was born due to large insert-only
tables, this concern seems very valid to me.

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-11 Thread Masahiko Sawada

On Thu, 12 Mar 2020 at 14:38, Laurenz Albe  wrote:
>
> On Thu, 2020-03-12 at 17:47 +1300, David Rowley wrote:
> > I'm starting to think that we should set the scale_factor to something
> > like 0.3 and the threshold to 50. Is anyone strongly against that?  Or
> > Laurenz, are you really set on the 10 million threshold?
>
> These values are almost the same as "autovacuum_vacuum_scale_factor"
> and "autovacuum_vacuum_threshold", so you actually agree with Masahiko
> with the exception that you want it tunable separately.
>
> I don't like the high scale factor.
>
> If your insert-only table was last vacuumed when it had 500 million rows,
> the next autovacuum will freeze 150 million tuples, which is a lot.
> The impact will be less than that of an anti-wraparound vacuum because
> it is not as persistent, but if our 150 million tuple autovacuum backs
> down because it hits a lock or gets killed by the DBA, that is also not
> good, since it will just come again.
> And the bigger the vacuum run is, the more likely it is to meet an obstacle.
>
> So I think that large insert-only tables should be vacuumed more often
> than that.  If the number of tuples that have to be frozen is small,
> the vacuum run will be short and is less likely to cause problems.
> That is why I chose a scale factor of 0 here.

The reason why you want to add new GUC parameters is to use different
default values for insert-update table case and insert-only table
case? I think I understand the pros and cons of adding separate
parameters, but I still cannot understand use cases where we cannot
handle without separate parameters.

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-11 Thread David Rowley

On Thu, 12 Mar 2020 at 18:38, Laurenz Albe  wrote:
>
> On Thu, 2020-03-12 at 17:47 +1300, David Rowley wrote:
> > Laurenz, are you really set on the 10 million threshold?
>
> These values are almost the same as "autovacuum_vacuum_scale_factor"
> and "autovacuum_vacuum_threshold", so you actually agree with Masahiko
> with the exception that you want it tunable separately.
>
> I don't like the high scale factor.
>
> If your insert-only table was last vacuumed when it had 500 million rows,
> the next autovacuum will freeze 150 million tuples, which is a lot.
> The impact will be less than that of an anti-wraparound vacuum because
> it is not as persistent, but if our 150 million tuple autovacuum backs
> down because it hits a lock or gets killed by the DBA, that is also not
> good, since it will just come again.
> And the bigger the vacuum run is, the more likely it is to meet an obstacle.
>
> So I think that large insert-only tables should be vacuumed more often
> than that.  If the number of tuples that have to be frozen is small,
> the vacuum run will be short and is less likely to cause problems.
> That is why I chose a scale factor of 0 here.

That's a good point.  If those 150 million inserts were done one per
transaction, then it wouldn't take many more tuples before wraparound
vacuums occur more often than insert vacuums.  The only way I see
around that is to a) configure it the way you'd like, or; b) add yet
another GUC and reloption to represent how close to
autovacuum_freeze_max_age / autovacuum_multixact_freeze_max_age the
table is.  I'm not very excited about adding yet another GUC, plus
anti-wraparound vacuums already occur 10 times more often than they
need to. If we added such a GUC and set it to, say, 0.1, then they'd
happen 100 times more often than needed before actual wraparound
occurs.

I'm starting to see now why you were opposed to the scale_factor in
the first place.

I really think that this is really a problem with the design of the
threshold and scale_factor system.  I used to commonly see people with
larger tables zeroing out the scale_factor and setting a reasonable
threshold or dropping the scale_factor down to some fraction of a
percent. I don't really have any better design in mind though, at
least not one that does not require adding new vacuum options.

Re: Berserk Autovacuum (let's save next Mandrill)

2020-03-11 Thread Laurenz Albe

On Thu, 2020-03-12 at 17:47 +1300, David Rowley wrote:
> I'm starting to think that we should set the scale_factor to something
> like 0.3 and the threshold to 50. Is anyone strongly against that?  Or
> Laurenz, are you really set on the 10 million threshold?

These values are almost the same as "autovacuum_vacuum_scale_factor"
and "autovacuum_vacuum_threshold", so you actually agree with Masahiko
with the exception that you want it tunable separately.

I don't like the high scale factor.

If your insert-only table was last vacuumed when it had 500 million rows,
the next autovacuum will freeze 150 million tuples, which is a lot.
The impact will be less than that of an anti-wraparound vacuum because
it is not as persistent, but if our 150 million tuple autovacuum backs
down because it hits a lock or gets killed by the DBA, that is also not
good, since it will just come again.
And the bigger the vacuum run is, the more likely it is to meet an obstacle.

So I think that large insert-only tables should be vacuumed more often
than that.  If the number of tuples that have to be frozen is small,
the vacuum run will be short and is less likely to cause problems.
That is why I chose a scale factor of 0 here.

But I totally see your point about index-only scans.

I think the problem is that this insert-only autovacuum serves two masters:
1. preventing massive anti-wraparound vacuum that severely impacts the system
2. maintaining the visibility map for index-only scans

I thought of the first case when I chose the parameter values.

I am afraid that we cannot come up with one setting that fits all, so I
advocate a setting that targets the first problem, which I think is more
important (and was the motivation for this thread).

I could add a paragraph to the documentation that tells people how to
configure the parameters if they want to use it to get index-only scans.

Yours,
Laurenz Albe

1 2 >

1 - 100 of 157 matches

Mail list logo