Rural Hunter <ruralhun...@gmail.com> wrote:

> Let me put here the whole scenario:
> 1. I was called by our application users that all the updating was
> failing. So I went to check the db. Any update transaction including
> manual vacuum is blocked out by the error message:
> ERROR: database is not accepting commands to avoid wraparound data loss
> in database "db1"
>   Suggestion:Stop the postmaster and use a standalone backend to
> vacuum that database.
>
> 2. Since db1 is a very large database(it is the main db the user is
> using) I can not afford to take long time to vacuum full on that. So I
> thought about to try on other small dbs first.

Why in the world would you want to use VACUUM FULL in this circumstance?

> 3. I stop the instance.
>
> 4. I use "echo 'vacuum full;' | postgres --single -D /pgdata [other
> dbs]" to vacuum some other dbs. I still got several warning messages
> when vacuum the first database(let's say db2):
> 2013-09-14 08:56:44 CST [5536]: [453-1] user=,db=,host= WARNING:
> database "db1" must be vacuumed within 999775 transactions
> 2013-09-14 08:56:44 CST [5536]: [454-1] user=,db=,host= HINT:  To avoid
> a database shutdown, execute a database-wide VACUUM in that database.

WARNING, not error, so the VACUUM would have run.

> Here the error message still points to db1.

I'm not sure which database would be referenced if the table which
needed the VACUUM was a shared table, like pg_database or
pg_authid.

> 5.  When I ran the single connection vacuum on other dbs(not db1), there
> was not any error/warning message. So I tried to start whole instance.
>
> 6. I started the instance and found everything is fine.
>
> So actually I have 3 questions here:
> 1. Was the db name in the error message wrong?

Probably not, to the extent that running VACUUM (FULL is not
necessary) against that database would have solved the problem.  If
it was a shared catalog table it might be that it was not the
*only* database which would work.

> 2. How would that happend? Shouldn't auto vacuum handle it and avoid
> such problem?

There are two possibilities -- either you had a long-running
transaction in the cluster or your autovacuum is not configured to
be aggressive enough to keep you out of trouble.

> 3. How to detect such problem earlier?

We would need a description of the machine (cores, RAM, storage
system) and the output of these queries to be able to make good
suggestions on tuning autovacuum:

SELECT version();
SELECT name, current_setting(name), source
  FROM pg_settings
  WHERE source NOT IN ('default', 'override');

In addition, make sure that you are monitoring for long-running
transactions.  A reasonable monitoring scheme might be to alert
when either of these queries returns any rows:

select * from pg_stat_activity where xact_start < (now() - interval '1 hour');
select * from pg_prepared_xacts where prepared < (now() - interval '1 minute');

You can, of course, adjust the intervals to what makes the most
sense for your environment.  If you have max_prepared_transactions
set to zero, the latter query is not really necessary.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Reply via email to