[Proposal] Global temporary tables

曾文旌(义从) Fri, 11 Oct 2019 06:17:21 -0700

Dear Hackers,

This propose a way to develop global temporary tables in PostgreSQL.

I noticed that there is an "Allow temporary tables to exist as empty by default
in all sessions" in the postgresql todolist.
https://wiki.postgresql.org/wiki/Todo <https://wiki.postgresql.org/wiki/Todo>

In recent years, PG community had many discussions about global temp table
(GTT) support. Previous discussion covered the following topics:
(1) The main benefit or function: GTT offers features like “persistent
schema, ephemeral data”, which avoids catalog bloat and reduces catalog vacuum.
(2) Whether follows ANSI concept of temporary tables
(3) How to deal with statistics, single copy of schema definition, relcache
(4) More can be seen in
https://www.postgresql.org/message-id/73954ab7-44d3-b37b-81a3-69bdcbb446f7%40postgrespro.ru
(5) A recent implementation and design from Konstantin Knizhnik covered
many functions of GTT:
https://www.postgresql.org/message-id/attachment/103265/global_private_temp-1.patch

<https://www.postgresql.org/message-id/attachment/103265/global_private_temp-1.patch>

However, as pointed by Konstantin himself, the implementation still needs
functions related to CLOG, vacuum, and MVCC visibility.

We developed GTT based on PG 11 and included most needed features, such as how
to deal with concurrent DDL and DML operations, how to handle vacuum and too
old relfrozenxids, and how to store and access GTT statistics.

This design followed many suggestions from previous discussion in community.
Here are some examples:
“have a separate 'relpersistence' setting for global temp tables…by
having the backend id in all filename…. From Andres Freund
Use session memory context to store information related to GTT. From
Pavel Stehule
“extend the relfilenode mapper to support a backend-local
non-persistent relfilenode map that's used to track temp table and index
relfilenodes…” from Craig Ringer

Our implementation creates one record in pg_class for GTT’s schema definition.
When rows are first inserted into the GTT in a session, a session specific file
is created to store the GTT’s data. Those files are removed when the session
ends. We maintain the GTT’s statistics in session local memory. DDL operations,
such as DROP table or CREATE INDEX, can be executed on a GTT only by one
session, while no other sessions insert any data into the GTT before or it is
already truncated. This also avoids the concurrency of DML and DDL operations
on GTT. We maintain a session level oldest relfrozenxids for GTT. This way,
autovacuum or vacuum can truncate CLOG and increase global relfrozenxids based
on all tables’ relfrozenxids, including GTT’s.
The follows summarize the main design and implementation:
Syntax: ON COMMIT PRESERVE ROWS and ON COMMIT DELETE ROWS
Data storage and buffering follows the same way as local temp table
with a relfilenode including session id.
A hash table(A) in shared memory is used to track sessions and their
usage of GTTs and to serialize DDL and DML operations.
Another hash table(B) in session memory is introduced to record storage
files for GTTs and their indexes. When a session ends, those files are removed.
The same hash table(B) in session memory is used to record the
relfrozenxids of each GTT. The oldest one is stored in myproc so that
autovacuum and vacuum may use it to determine global oldest relfrozenxids and
truncate clog.
The same hash table(B) in session memory stores GTT’s session level
statistics, It is generated during the operations of vacuum and analyze, and
used by SQL optimizer to create execution plan.
Some utility functions are added for DBA to manage GTTs.
TRUNCATE command on a GTT behaves differently from that on a normal
table. The command deletes the data immediately but keeps relfilenode using
lower level table lock, RowExclusiveLock, instead of AccessExclusiveLock.
Main limits of this version or future improvement: need suggestions
from community:
1 VACUUM FULL and CLUSTER are not supported; any operations
which may change relfilenode are disabled to GTT.
2 Sequence column is not supported in GTT for now.
3 Users defined statistics is not supported.

Details:

Requirement
The features list about global temp table:
1. global temp table (ON COMMIT clause is omitted, SQL specifies that
the default behavior is ON COMMIT DELETE ROWS)
2. support with on commit DELETE ROWS
3. support with on commit PRESERVE ROWS
4. not support ON COMMIT DROP

Feature description
Global temp tables are defined just once and automatically exist (starting with
empty contents) in every session that needs them.
Global temp table, each session use local buffer, read or write independent
data files.
Use on commit DELETE ROWS for a transaction-specific global temp table. This is
the default. database will truncate the table (delete all its rows) after each
commit.
Use on commit PRESERVE ROWS Specify PRESERVE ROWS for a session-specific global
temp table. databse will truncate the table (delete all its rows) when you
terminate the session.

design
Global temp tables are designed based on local temp table(buffer and storage
files).
Because the catalog of global temp table is shared between sessions but the
data is not shared, we need to build some new mechanisms to manage non-shared
data and statistics for those data.

1. catalog
1.1 relpersistence
define RELPERSISTENCEGLOBALTEMP 'g'
Mark global temp table in pg_class relpersistence to 'T'. The relpersistence of
the index created on the global temp table is also set to ’T'

1.2 on commit clause
In local temp table on commit DELETE ROWS and on commit PRESERVE ROWS not store
in catalog, but GTT need.
Store a bool value oncommitdelete_rows to reloptions only for GTT and share
with other session.

2. gram.y
Global temp table already has a syntax tree. jush need to remove the warning
message "GLOBAL is deprecated in temporary table creation" and mark
relpersistence = RELPERSISTENCEGLOBALTEMP

3. STORAGE
3.1. active_gtt_shared_hash
create a hash table in shared memory to trace the GTT files that are
initialized in each session.
Each hash entry contains a bitmap that records the backendid of the initialized
GTT file.
With this hash table, we know which backend/session are using this GTT.
It will be used in GTT's DDL.

3.2. gtt_storage_local_hash
In each backend, create a local hashtable gtt_storage_local_hash for tracks GTT
storage file and statistics.
1). GTT storage file track
When one session inserts data into a GTT for the first time, record to local
hash.
2). normal clean GTT files
Use beforeshmemexit to ensure that all files for the session GTT are deleted
when the session exits.
3). abnormal situation file cleanup
When a backend exits abnormally (such as oom kill), the startup process started
to recovery before accept connect. startup process check and remove all GTT
files before redo wal.

4 DDL
4.1 DROP GTT
One GTT table is allowed to be deleted when only the current session USES it.
After get the AccessExclusiveLock of the GTT table, use active_gtt_shared_hash
to check and make sure that.

4.2 ALTER GTT
Same as drop GTT.

4.3 CREATE INDEX ON GTT, DROP INDEX ON GTT
Same as drop GTT.

4.4 TRUNCATE GTT
The truncate GTT use RowExclusiveLock, not AccessExclusiveLock, Because
truncate only cleans up local data file and local buffers in this session.
Also, truncate immediately deletes the data file without changing the
relfilenode of the GTT table. btw, I'm not sure the implementation will be
acceptable to the community.

4.5 create index on GTT
Same as drop GTT.

4.6 OTHERS
Any table operations about GTT that need to change relfilenode are disabled,
such as vacuum full/cluster.

5. The statistics of GTT
1 relpages reltuples relallvisible frozenxid minmulti from pg_class
2 The statistics for each column from pg_statistic
All the above information will be stored to gtt_storage_local_hash.
When vacuum or analyze GTT's statistic will update, and the planner will use
them. Of course, statistics only contain data within the current session.

5.1. View global temp table statistics
Provide pggttattstatistic get column statistics for GTT. Provide pggtt_relstats
to rel statistics for GTT.
These functions are implemented in a plug-in, without add system view or
function.

6. autovacuum
Autovacuum skips all GTT.

7. vacuum(frozenxid push, clog truncate)
The GTT data file contains transaction information. Queries for GTT data rely
on transaction information such as clog. That's can not be vacuumed
automatically by vacuum.
7.1 The session level gtt oldest frozenxid
When one GTT been create or remove, record the session level oldest frozenxid
and put it into MyProc.

7.1 vacuum
When vacuum push the db's frozenxid(vacupdatedatfrozenxid), need to consider
the GTT. It needs to calculate the transactions required for the GTT(search all
MyPorc), to avoid the clog required by GTT being cleaned.

8. Parallel query
Planner does not produce parallel query plans for SQL related to global temp
table.

9. Operability
Provide pggttattachedpid lists all the pids that are using the GTT. Provide
pglistgttrelfrozenxids lists the session level oldest frozenxid of using GTT.
These functions are implemented in a plug-in, without add system view or
function.
DBA can use the above function and pgterminatebackend to force the cleanup of
"too old" GTT tables and sessions.

10. Limitations and todo list
10.1. alter GTT
10.2. pg_statistic_ext
10.3. remove GTT's relfilenode can not change limit.
cluster/vacuum full, optimize truncate gtt.
10.4. SERIAL column type
The GTT from different sessions share a sequence(SERIAL type).
Need each session use the sequence independently.
10.5. Locking optimization for GTT.
10.6 materialized views is not support on GTT.

What do you thinking about this proposal?
Looking forward to your feedback.

Thanks!

regards

--
Zeng Wenjing
Alibaba Group-Database Products Business Unit

[Proposal] Global temporary tables

Reply via email to