Re: What specific circumstances trigger Autovacuum wraparound

2018-06-05 Thread Daniel Lagerman
Hello David!

Thanks for taking the time to respond. Upgrading is not in scope right now
for this server even though that is certainly something I will do down the
line.
I talked to a colleague of mine and he dug up something very interesting
and probably answers the behavior I'm seeing, from PG archives. So I
thought I would post it here so that if this question comes up again its
more easy to find it in the future for PG 9.4 specifically.

There is another setting that controls Autovacuum called
autovacuum_multixact_freeze_max_age, in PG 9.5 it is easy to get the
correct value for the table to see if you hit the default limit of 400
million, however in PG 9.4 and earlier the function mxid_age() does not
exist, so you can't query this value directly, hence this reference from
this PG Archive post is crucial:
http://www.postgresql-archive.org/could-not-access-status-of-transaction-pg-multixact-issue-td5822248.html

The thread above reference a bug thread BUG #11264
https://www.postgresql.org/message-id/flat/20140827175105.GI7046%40eldon.alvh.no-ip.org#20140827175105.gi7...@eldon.alvh.no-ip.org
And in that thread we can see Alvaro Herrera write the following paragraph:

"A better way not involving mxid_age() would be to use pg_controldata to
extract the current value of the mxid counter, then subtract the current
relminmxid from that value."

Given that statement we took a look at this server and the table in
question. We could see that NextMultiXactId:  2640838929 and the relminmxid
for that specific table was 1791982183. Now if you take 2640838929 -
1791982183 = 848856746 you can see that this value is the double of the
default value of 400 million and would explain why vacuum is running:

"As a safety device, a whole-table vacuum scan will occur for any table
whose multixact-age is greater than autovacuum_multixact_freeze_max_age.
Whole-table vacuum scans will also occur progressively for all tables,
starting with those that have the oldest multixact-age, if the amount of
used member storage space exceeds the amount 50% of the addressible storage
space. Both of these kinds of whole-table scans will occur even if
autovacuum is nominally disabled."

This calculation is also confirmed by looking at the git commit of
mxid_age() this is exactly what this function returns and even though the 5
other tables are not over this value we suspect that due to that we are
using more than 50% of the addressable storage space Autovacuum takes to
"opportunity" to run for these tables also.

Now you could of course increase this value also, not sure what the max is,
but I rather want to complete vacuum. So the plan for me now is that during
the weekend, where there is no activity at all except for a backup, is to
increase the work memory from relatively low 1024 MB to use almost all
memory I have on this server, set the server to use one worker and then see
if Autovacuum by itself can solve the cleanup by itself or at least close
the gap so that are confident that we will catch up.

Best regards

Daniel


On Mon, Jun 4, 2018 at 11:47 PM, David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Monday, June 4, 2018, Daniel Lagerman  wrote:
>
>>  I have a pg 9.4.3 server that has one table with calculate age of about
>> 320 million, this is the oldest table in this database and therefore the
>> age of the DB is the same as the table.
>>
>
> It would be advisable to simply upgrade to 9.4.18 and see if any of the
> various vacuum related bug fixes solves your problem.  That's about the
> extent of my useful advice for this...
>
> David J.
>


Re: What specific circumstances trigger Autovacuum wraparound

2018-06-04 Thread David G. Johnston
On Monday, June 4, 2018, Daniel Lagerman  wrote:

>  I have a pg 9.4.3 server that has one table with calculate age of about
> 320 million, this is the oldest table in this database and therefore the
> age of the DB is the same as the table.
>

It would be advisable to simply upgrade to 9.4.18 and see if any of the
various vacuum related bug fixes solves your problem.  That's about the
extent of my useful advice for this...

David J.


What specific circumstances trigger Autovacuum wraparound

2018-06-04 Thread Daniel Lagerman
Hello!

Please note that I have read:
https://www.postgresql.org/docs/current/static/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND

And I think i understand how the settings are supposed work, however I do
have a question in regards to Autovaccum wrap-around and how it is
triggered. I have a pg 9.4.3 server that has one table with calculate age
of about 320 million, this is the oldest table in this database and
therefore the age of the DB is the same as the table. This is calculated
with the SQL provided on the page above.

Before I get into the settings I have I will explain the behavior. Right
now there is 6 Autovaccum wraparound running, I have 6 workers configured
so nothing strange about that, but what puzzles me is that only 3 out of 6
tables has not recently completed an Autovaccum and only one table was over
the old limit when it started.

The settings that are set are as follows:
vacuum_freeze_min_age = 1000
autovacuum_freeze_max_age = 6
vacuum_freeze_table_age = 4
vacuum_cost_delay = 10
vacuum_cost_limit = 2000
autovacuum_vacuum_cost_limit = 200
autovacuum_vacuum_cost_delay = 20

All other settings for vacuum/autovacuum are PG 9.4.3 default. And server
has been restarted after settings were changed.

If I read the article above and I quote "autovacuum will be invoked on it
approximately once every autovacuum_freeze_max_age minus
vacuum_freeze_min_age transactions" to me it indicates that this would be
around every 590 million transactions after I did my change. But maybe it
is to late once it has triggered? With the default values it would be
around 150 million which seems to add up more that what I see.

I also read and I quote " so it can be useful to try to maximize the
interval between forced autovacuums on very large static tables. Obviously
one can do this either by increasing autovacuum_freeze_max_age or
decreasing vacuum_freeze_min_age "

In my scenario the table is not static, but I interpret that you could
delay vacuum by changing these parameters.

What also puzzles is that even once PG completes an Autovaccum (all table
except 3 right now) the Autovacuum wraparound just loops over every table
once more. Some tables have been Autovaccumed wraparound 4-5 times since
the last restart of PG server 2 days ago. It seems to me that due to that
the database itself is over the default limit Autovacuum is running on a
loop for all tables to try and get it below that limit.

There are no normal vacuums as I can see, which means I can't run anything
like flexible freeze during off hours (middle of the night) without doing a
manual cancel query on the Autovaccum which might not be a good idea.

I did attempt to run flexible freeze for 45 hours during the weekend (with
very aggressive settings compared to Autovaccum),  and it did not complete
the table which is around 1 TB large in that time. I'm worried that these
two vacuums are fighting/resetting each other and will never complete if I
keep doing that, so I have stopped that for now.

I have another server, same specs, where the same table is actually larger
in disk size, than the server i'm currently working on, where I do not see
this behavior and vacuum actually completed in a couple of days of running.

Hopefully I have not missed anything very obvious. I'm most interested in
and explanation of why PG is behaving as it is, as it's always good to
understand why something is happening even if you can't change it and also
in regards to cancel Autovacuum to run flexible freeze during off hours in
order to try and speed up the process, is it setting me back or not.

Best regards

Daniel