Hello,

I've found a problem that an orphaned temporary table could cause XID 
wraparound.  Our customer encountered this problem with PG 9.5.2, but I think 
this will happen with the latest PG.

I'm willing to fix this, but I'd like to ask you what approach we should take.


PROBLEM
====================================

The customer has a database for application data, which I call it user_db here. 
 They don't store application data in postgres database.

No tables in user_db was autovacuumed for more than a month, leading to user 
tables bloating.  Those tables are eligible for autovacuum according to 
pg_stat_all_tables and autovacuum settings.

age(datfrozenxid) of user_db and postgres databases are greater than 
autovacuum_max_freeze_age, so they are eligible for autovacuuming for XID 
wraparound.

There are user tables in user_db whose age(relfrozenxid) is greater than 
autovacuum_freeze_max_age, so those tables should get autovacuum treatment.


CAUSE
====================================

postgres database has a table named pg_temp_3.fetchchunks, whose 
age(relfrozenxid) is greater than autovacuum_freeze_max_age.  This temporary 
table is the culprit.  pg_temp_3.fetchchunks is created by pg_rewind.  The 
customer says they ran pg_rewind.

autovacuum launcher always choose postgres, because do_start_worker() scans 
pg_database and finds that postgres database needs vacuuming for XID 
wraparound.  user_db is never chosen for vacuuming, although it also needs 
vacuuming for XID wraparound.

autovacuum worker doesn't delete pg_temp3.fetchchunks, because the backendid 3 
is used by some application so autovacuum worker thinks that the backend is 
active and the temporary table cannot be dropped.

I don't know why pg_temp3.fetchchunks still exists.  Maybe the user ran pg_ctl 
stop -mi while pg_rewind was running.


FIX
====================================

I have the following questions.  Along which line should I proceed to fix the 
problem?

* Why does autovacuum launcher always choose only one database when that 
database need vacuuming for XID wraparound?  Shouldn't it also choose other 
databases?

* I think temporary tables should not require vacuuming for XID wraparound.  
Furtherover, should updates/deletes to temporary tables  be in-place instead of 
creating garbage, so that any form of vacuum is unnecessary?  Other sessions do 
not need to read temporary tables.

* In this incident, autovacuum worker misjudged that pg_temp_3.fetchchunks 
can't be deleted, although the creator (pg_rewind) is no longer active.  How 
can we delete orphan temporary tables safely?


Regards
Takayuki Tsunakawa




Reply via email to