dukeofhazardz commented on issue #354: URL: https://github.com/apache/age/issues/354#issuecomment-1500500769
This is a summary of the first chapter of the book 'Internals of PostgreSQL' The first chapter of 'The Internals of PostgreSQL' covers the logical and physical structures of a database cluster, including the definition of a database cluster, a collection of databases managed by a PostgreSQL server. The chapter describes the relationship between databases and database objects, which are data structures used to store or reference data, and the use of object identifiers (OIDs) to manage these objects and how to find the OIDs of databases and heap tables by querying system catalogs such as pg_database and pg_class. The chapter also describes the physical structure of a PostgreSQL database cluster. The database cluster is a directory referred to as the "base directory" that contains many subdirectories and files. The database cluster is created using the initdb utility and is typically located in the PGDATA environment variable. Each database in the cluster is a subdirectory under the "base" subdirectory, and tables and indexes are stored as files under the database subdirectory. PostgreSQL also supports tablespaces, which are directories that contain data outside of the base directory. The chapter describes the layout of databases and files associated with tables and indexes in a PostgreSQL database cluster. Each database has a subdirectory named after its respective OID under the base subdirectory. Tables and indexes smaller than 1GB are stored as a single file under the database directory they belong to. These data files are managed by the relfilenode variable, and their values basically match their respective OIDs. However, the relfilenode values can be changed using commands like TRUNCATE, REINDEX, and CLUSTER. For example, if a table is truncated, a new relfilenode is assigned, the old data file is removed, and a new one is created. The chapter covers the layout of files associated with tables and indexes in a PostgreSQL database cluster. Each table or index is a single file stored under the database directory it belongs to, and the filenames are identified by the respective OIDs. The tables have two associated files, the free space map and the visibility map, which store information on the free space capacity and the visibility on each page within the table file, respectively. Indexes only have individual free space maps and do not have a visibility map. These files may be internally referred to as the forks of each relation. PostgreSQL also allows the creation of tablespaces, which are additional data areas outside the base directory, and the version-specific subdirectory will be created under the directory specified in the CREATE TABLESPACE statement. The data file in PostgreSQL is divided into pages or blocks of fixed length. The pages are numbered sequentially from 0 and have block numbers. The internal layout of the pages consists of heap tuples, line pointers, and header data. The header data contains general information about the page such as pd_lsn, pd_checksum, pd_lower, pd_upper, and pd_special. To identify a tuple, tuple identifier (TID) is used, which comprises a pair of values: the block number of the page that contains the tuple and the offset number of the line pointer that points to the tuple. Heap tuples whose size is greater than about 2 KB are managed using the TOAST technique. The chapter concludes by discussing the methods of writing and reading heap tuples. When inserting a new tuple, a new line pointer is added to the page and the pd_lower and pd_upper pointers are updated accordingly. Sequential scan involves reading all tuples in all pages by scanning all line pointers in each page, while B-tree index scan involves reading index tuples containing index keys and TIDs pointing to the target heap tuple, which can then be directly read without unnecessary scanning in the pages. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@age.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org