dukeofhazardz commented on issue #354:
URL: https://github.com/apache/age/issues/354#issuecomment-1500500769

   This is a summary of the first chapter of the book 'Internals of PostgreSQL'
   
   The first chapter of 'The Internals of PostgreSQL' covers the logical and 
physical structures of a database cluster, including the definition of a 
database cluster, a collection of databases managed by a PostgreSQL server.
   
   The chapter describes the relationship between databases and database 
objects, which are data structures used to store or reference data, and the use 
of object identifiers (OIDs) to manage these objects and how to find the OIDs 
of databases and heap tables by querying system catalogs such as pg_database 
and pg_class.
   
   The chapter also describes the physical structure of a PostgreSQL database 
cluster. The database cluster is a directory referred to as the "base 
directory" that contains many subdirectories and files. The database cluster is 
created using the initdb utility and is typically located in the PGDATA 
environment variable. Each database in the cluster is a subdirectory under the 
"base" subdirectory, and tables and indexes are stored as files under the 
database subdirectory. PostgreSQL also supports tablespaces, which are 
directories that contain data outside of the base directory.
   
   The chapter describes the layout of databases and files associated with 
tables and indexes in a PostgreSQL database cluster. Each database has a 
subdirectory named after its respective OID under the base subdirectory. Tables 
and indexes smaller than 1GB are stored as a single file under the database 
directory they belong to. These data files are managed by the relfilenode 
variable, and their values basically match their respective OIDs. However, the 
relfilenode values can be changed using commands like TRUNCATE, REINDEX, and 
CLUSTER. For example, if a table is truncated, a new relfilenode is assigned, 
the old data file is removed, and a new one is created.
   
   The chapter covers the layout of files associated with tables and indexes in 
a PostgreSQL database cluster. Each table or index is a single file stored 
under the database directory it belongs to, and the filenames are identified by 
the respective OIDs. The tables have two associated files, the free space map 
and the visibility map, which store information on the free space capacity and 
the visibility on each page within the table file, respectively. Indexes only 
have individual free space maps and do not have a visibility map. These files 
may be internally referred to as the forks of each relation. PostgreSQL also 
allows the creation of tablespaces, which are additional data areas outside the 
base directory, and the version-specific subdirectory will be created under the 
directory specified in the CREATE TABLESPACE statement.
   
   The data file in PostgreSQL is divided into pages or blocks of fixed length. 
The pages are numbered sequentially from 0 and have block numbers. The internal 
layout of the pages consists of heap tuples, line pointers, and header data. 
The header data contains general information about the page such as pd_lsn, 
pd_checksum, pd_lower, pd_upper, and pd_special. To identify a tuple, tuple 
identifier (TID) is used, which comprises a pair of values: the block number of 
the page that contains the tuple and the offset number of the line pointer that 
points to the tuple. Heap tuples whose size is greater than about 2 KB are 
managed using the TOAST technique.
   
   The chapter concludes by discussing the methods of writing and reading heap 
tuples. When inserting a new tuple, a new line pointer is added to the page and 
the pd_lower and pd_upper pointers are updated accordingly. Sequential scan 
involves reading all tuples in all pages by scanning all line pointers in each 
page, while B-tree index scan involves reading index tuples containing index 
keys and TIDs pointing to the target heap tuple, which can then be directly 
read without unnecessary scanning in the pages.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@age.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to