Dear Hackers,

This propose a way to develop global temporary tables in PostgreSQL.

I noticed that there is an "Allow temporary tables to exist as empty by default 
in all sessions" in the postgresql todolist.
https://wiki.postgresql.org/wiki/Todo <https://wiki.postgresql.org/wiki/Todo>

In recent years, PG community had many discussions about global temp table 
(GTT) support. Previous discussion covered the following topics: 
(1)     The main benefit or function: GTT offers features like “persistent 
schema, ephemeral data”, which avoids catalog bloat and reduces catalog vacuum. 
(2)     Whether follows ANSI concept of temporary tables
(3)     How to deal with statistics, single copy of schema definition, relcache
(4)     More can be seen in 
https://www.postgresql.org/message-id/73954ab7-44d3-b37b-81a3-69bdcbb446f7%40postgrespro.ru
(5)     A recent implementation and design from Konstantin Knizhnik covered 
many functions of GTT: 
https://www.postgresql.org/message-id/attachment/103265/global_private_temp-1.patch
 
<https://www.postgresql.org/message-id/attachment/103265/global_private_temp-1.patch>

However, as pointed by Konstantin himself, the implementation still needs 
functions related to CLOG, vacuum, and MVCC visibility.

We developed GTT based on PG 11 and included most needed features, such as how 
to deal with concurrent DDL and DML operations, how to handle vacuum and too 
old relfrozenxids, and how to store and access GTT statistics. 

This design followed many suggestions from previous discussion in community. 
Here are some examples:
        “have a separate 'relpersistence' setting for global temp tables…by 
having the backend id in all filename….   From Andres Freund
        Use session memory context to store information related to GTT.   From 
Pavel Stehule
        “extend the relfilenode mapper to support a backend-local 
non-persistent relfilenode map that's used to track temp table and index 
relfilenodes…” from Craig Ringer

Our implementation creates one record in pg_class for GTT’s schema definition. 
When rows are first inserted into the GTT in a session, a session specific file 
is created to store the GTT’s data. Those files are removed when the session 
ends. We maintain the GTT’s statistics in session local memory. DDL operations, 
such as DROP table or CREATE INDEX, can be executed on a GTT only by one 
session, while no other sessions insert any data into the GTT before or it is 
already truncated. This also avoids the concurrency of DML and DDL operations 
on GTT. We maintain a session level oldest relfrozenxids for GTT. This way, 
autovacuum or vacuum can truncate CLOG and increase global relfrozenxids based 
on all tables’ relfrozenxids, including GTT’s. 
The follows summarize the main design and implementation: 
        Syntax: ON COMMIT PRESERVE ROWS and ON COMMIT DELETE ROWS
        Data storage and buffering follows the same way as local temp table 
with a relfilenode including session id.
        A hash table(A) in shared memory is used to track sessions and their 
usage of GTTs and to serialize DDL and DML operations. 
        Another hash table(B) in session memory is introduced to record storage 
files for GTTs and their indexes. When a session ends, those files are removed. 
        The same hash table(B) in session memory is used to record the 
relfrozenxids of each GTT. The oldest one is stored in myproc so that 
autovacuum and vacuum may use it to determine global oldest relfrozenxids and 
truncate clog. 
        The same hash table(B) in session memory stores GTT’s session level 
statistics, It is generated during the operations of vacuum and analyze, and 
used by SQL optimizer to create execution plan. 
        Some utility functions are added for DBA to manage GTTs. 
        TRUNCATE command on a GTT behaves differently from that on a normal 
table. The command deletes the data immediately but keeps relfilenode using 
lower level table lock, RowExclusiveLock, instead of  AccessExclusiveLock. 
        Main limits of this version or future improvement: need suggestions 
from community: 
                1 VACUUM FULL and CLUSTER are not supported; any operations 
which may change relfilenode are disabled to GTT.
                2 Sequence column is not supported in GTT for now.
                3 Users defined statistics is not supported.


Details:

Requirement
The features list about global temp table:
        1. global temp table (ON COMMIT clause is omitted, SQL specifies that 
the default behavior is ON COMMIT DELETE ROWS)
        2. support with on commit DELETE ROWS
        3. support with on commit PRESERVE ROWS
        4. not support ON COMMIT DROP

Feature description
Global temp tables are defined just once and automatically exist (starting with 
empty contents) in every session that needs them.
Global temp table, each session use local buffer, read or write independent 
data files.
Use on commit DELETE ROWS for a transaction-specific global temp table. This is 
the default. database will truncate the table (delete all its rows) after each 
commit.
Use on commit PRESERVE ROWS Specify PRESERVE ROWS for a session-specific global 
temp table. databse will truncate the table (delete all its rows) when you 
terminate the session.

design
Global temp tables are designed based on local temp table(buffer and storage 
files). 
Because the catalog of global temp table is shared between sessions but the 
data is not shared, we need to build some new mechanisms to manage non-shared 
data and statistics for those data.

1. catalog
1.1 relpersistence
define RELPERSISTENCEGLOBALTEMP 'g'
Mark global temp table in pg_class relpersistence to 'T'. The relpersistence of 
the index created on the global temp table is also set to ’T'

1.2 on commit clause
In local temp table on commit DELETE ROWS and on commit PRESERVE ROWS not store 
in catalog, but GTT need.
Store a bool value oncommitdelete_rows to reloptions only for GTT and share 
with other session.

2. gram.y
Global temp table already has a syntax tree. jush need to remove the warning 
message "GLOBAL is deprecated in temporary table creation" and mark 
relpersistence = RELPERSISTENCEGLOBALTEMP

3. STORAGE
3.1. active_gtt_shared_hash
create a hash table in shared memory to trace the GTT files that are 
initialized in each session. 
Each hash entry contains a bitmap that records the backendid of the initialized 
GTT file.
With this hash table, we know which backend/session are using this GTT.
It will be used in GTT's DDL.

3.2. gtt_storage_local_hash
In each backend, create a local hashtable gtt_storage_local_hash for tracks GTT 
storage file and statistics.
1). GTT storage file track
When one session inserts data into a GTT for the first time, record to local 
hash.
2). normal clean GTT files
Use beforeshmemexit to ensure that all files for the session GTT are deleted 
when the session exits.
3). abnormal situation file cleanup
When a backend exits abnormally (such as oom kill), the startup process started 
to recovery before accept connect. startup process check and remove all GTT 
files before redo wal.

4 DDL
4.1 DROP GTT
One GTT table is allowed to be deleted when only the current session USES it. 
After get the AccessExclusiveLock of the GTT table, use active_gtt_shared_hash 
to check and make sure that.

4.2 ALTER GTT
Same as drop GTT.

4.3 CREATE INDEX ON GTT, DROP INDEX ON GTT
Same as drop GTT.

4.4 TRUNCATE GTT
The truncate GTT use RowExclusiveLock, not AccessExclusiveLock, Because 
truncate only cleans up local data file and local buffers in this session.
Also, truncate immediately deletes the data file without changing the 
relfilenode of the GTT table. btw, I'm not sure the implementation will be 
acceptable to the community.

4.5  create index on GTT
Same as drop GTT.

4.6 OTHERS
Any table operations about GTT that need to change relfilenode are disabled, 
such as vacuum full/cluster.

5. The statistics of GTT
        1 relpages reltuples relallvisible frozenxid minmulti from pg_class
        2 The statistics for each column from pg_statistic
All the above information will be stored to gtt_storage_local_hash.
When vacuum or analyze GTT's statistic will update, and the planner will use 
them. Of course, statistics only contain data within the current session.

5.1. View global temp table statistics
Provide pggttattstatistic get column statistics for GTT. Provide pggtt_relstats 
to rel statistics for GTT.
These functions are implemented in a plug-in, without add system view or 
function.

6. autovacuum
Autovacuum skips all GTT.

7. vacuum(frozenxid push, clog truncate)
The GTT data file contains transaction information. Queries for GTT data rely 
on transaction information such as clog. That's can not be vacuumed 
automatically by vacuum.
7.1 The session level gtt oldest frozenxid
When one GTT been create or remove, record the session level oldest frozenxid 
and put it into MyProc. 

7.1 vacuum
When vacuum push the db's frozenxid(vacupdatedatfrozenxid), need to consider 
the GTT. It needs to calculate the transactions required for the GTT(search all 
MyPorc), to avoid the clog required by GTT being cleaned.

8. Parallel query
Planner does not produce parallel query plans for SQL related to global temp 
table.

9. Operability
Provide pggttattachedpid lists all the pids that are using the GTT. Provide 
pglistgttrelfrozenxids lists the session level oldest frozenxid of using GTT.
These functions are implemented in a plug-in, without add system view or 
function.
DBA can use the above function and pgterminatebackend to force the cleanup of 
"too old" GTT tables and sessions.

10. Limitations and todo list
10.1. alter GTT
10.2. pg_statistic_ext
10.3. remove GTT's relfilenode can not change limit.
cluster/vacuum full, optimize truncate gtt.
10.4. SERIAL column type
The GTT from different sessions share a sequence(SERIAL type).
Need each session use the sequence independently.
10.5. Locking optimization for GTT.
10.6 materialized views is not support on GTT.


What do you thinking about this proposal?
Looking forward to your feedback.

Thanks!


regards

--
Zeng Wenjing
Alibaba Group-Database Products Business Unit


Reply via email to