Re: Add A Glossary
On 2020-Jul-21, Jürgen Purtz wrote: > - Added '(process)' to the two terms 'Autovacuum' and 'Stats Collector' > > - Removed link to himself in 'Logger (process)' > > - new term: Base Backup Pushed. I was not courageous enough to include "base backup" in 13, so that one's in master only, but the other ones are in both branches. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Add A Glossary
On 19.06.20 19:10, Alvaro Herrera wrote: Thanks for these fixes! I included all of these. On 2020-Jun-19, Erik Rijkers wrote: And one thing that I am not sure of (but strikes me as a bit odd): there are several cases of 'are enforced unique'. Should that not be 'are enforced to be unique' ? I included this change too; I am not too sure of it myself. If some English language neatnik wants to argue one way or the other, be my guest. - Added '(process)' to the two terms 'Autovacuum' and 'Stats Collector' - Removed link to himself in 'Logger (process)' - new term: Base Backup -- Jürgen Purtz diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml index 76525c6302..58e5071642 100644 --- a/doc/src/sgml/glossary.sgml +++ b/doc/src/sgml/glossary.sgml @@ -108,7 +108,7 @@ - Autovacuum + Autovacuum (process) A set of background processes that routinely perform @@ -178,6 +178,19 @@ + + Base Backup + + + A binary copy of all + database cluster + files. It is generated by the tool . + In combination with WAL files it can be used as the starting point + for recovery, log shipping, or streaming replication. + + + + Bloat @@ -855,8 +868,7 @@ Logger (process) - If activated, the - Logger process + If activated, the process writes information about database events into the current log file. When reaching certain time- or @@ -1486,7 +1498,7 @@ - Stats collector + Stats collector (process) This process collects statistical information about the
Re: Add A Glossary
Thanks for these fixes! I included all of these. On 2020-Jun-19, Erik Rijkers wrote: > And one thing that I am not sure of (but strikes me as a bit odd): > there are several cases of > 'are enforced unique'. Should that not be > 'are enforced to be unique' ? I included this change too; I am not too sure of it myself. If some English language neatnik wants to argue one way or the other, be my guest. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Add A Glossary
On 2020-06-19 01:51, Alvaro Herrera wrote: On 2020-Jun-16, Justin Pryzby wrote: On Tue, Jun 16, 2020 at 08:09:26PM -0400, Alvaro Herrera wrote: I noticed one typo: 'aggregates functions' should be 'aggregate functions' And one thing that I am not sure of (but strikes me as a bit odd): there are several cases of 'are enforced unique'. Should that not be 'are enforced to be unique' ? Anther small mistake (2x): 'The name of such objects of the same type are' should be 'The names of such objects of the same type are' (this phrase occurs 2x wrong, 1x correct) thanks, Erik Rijkers
Re: Add A Glossary
On 2020-Jun-16, Justin Pryzby wrote: > On Tue, Jun 16, 2020 at 08:09:26PM -0400, Alvaro Herrera wrote: Thanks for the review. I merged all your suggestions. This one: > >Most local objects belong to a specific > > + schema in their > > + containing database, such as > > + all types of > > relations, > > + all types of > > functions, > > Maybe say: >Relations< (all types), and >Functions< (all types) led me down not one but two rabbit holes; first I realized that "functions" is an insufficient term since procedures should also be included but weren't, so I had to add the more generic term "routine" and then modify the definitions of all routine types to mix in well. I think overall the quality of these definitions is improved as a result. I also felt the need to revise the definition of "relations", so I did that too; this made me change the definition of resultset too. On 2020-Jun-17, Jürgen Purtz wrote: > +1, with two formal changes: > > - Rearrangement of term "Data page" to meet alphabetical order. To forestall these ordering issues (look, another rabbit hole), I grepped the file for all glossterms and sorted that under en_US rules, then reordered the terms to match that. Turns out there were several other ordering mistakes. git grep '' | sed -e 's/<[^>]*>\([^<]*\)<[^>]*>/\1/' > orig LC_COLLATE=en_US.UTF-8 sort orig > sorted (Eliminating the tags is important, otherwise the sort uses the tags themselves to disambiguate) > One last question: The definition of "Data directory" reads "... A cluster's > storage space comprises the data directory plus ..." and 'cluster' links to > '"glossary-instance". Shouldn't it link to "glossary-db-cluster"? Yes, an oversight, thanks. I also added TPS, because I had already written it. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml index 25b03f3b37..5274feabba 100644 --- a/doc/src/sgml/glossary.sgml +++ b/doc/src/sgml/glossary.sgml @@ -23,7 +23,7 @@ - Aggregate function + Aggregate function (routine) A function that @@ -39,6 +39,11 @@ + + Analytic function + + + Analyze (operation) @@ -57,11 +62,6 @@ - - Analytic function - - - Atomic @@ -389,40 +389,33 @@ - - Data directory + + Database - The base directory on the filesystem of a - server that contains all - data files and subdirectories associated with an - instance (with the - exception of tablespaces). - The environment variable PGDATA is commonly used to - refer to the - data directory. - - - An instance's storage - space comprises the data directory plus any additional tablespaces. + A named collection of + local SQL objects. For more information, see - . + . - - Database + + Database cluster - A named collection of - SQL objects. + A collection of databases and global SQL objects, + and their common static and dynamic metadata. + Sometimes referred to as a + cluster. - For more information, see - . + In PostgreSQL, the term + cluster is also sometimes used to refer to an instance. + (Don't confuse this term with the SQL command CLUSTER.) @@ -432,6 +425,31 @@ + + Data directory + + + The base directory on the filesystem of a + server that contains all + data files and subdirectories associated with a + database cluster + (with the exception of + tablespaces, + and optionally WAL). + The environment variable PGDATA is commonly used to + refer to the data directory. + + + A cluster's storage + space comprises the data directory plus any additional tablespaces. + + + For more information, see + . + + + + Data page @@ -578,7 +596,7 @@ - Foreign table + Foreign table (relation) A relation which appears to have @@ -631,12 +649,20 @@ - Function + Function (routine) - Any defined transformation of data. Many functions are already defined - within PostgreSQL itself, but user-defined - ones can also be added. + A type of routine that receives zero or more arguments, returns zero or more + output values, and is constrained to run within one transaction. + Functions are invoked as part of a query, for example via + SELECT. + Certain functions can return + sets; those are + called set-returning functions. + + + Functions can also be used for + triggers to invoke. For more information, see @@ -689,13 +715,12 @@ - Index + Index (relation) A relation
Re: Add A Glossary
On 17.06.20 02:09, Alvaro Herrera wrote: On 2020-Jun-09, Jürgen Purtz wrote: Can you agree to the following definitions? If no, we can alternatively formulate for each of them: "Under discussion - currently not defined". My proposals are inspired by chapter 2.2 Concepts: "Tables are grouped into databases, and a collection of databases managed by a single PostgreSQL server instance constitutes a database cluster." After sleeping on it a few more times, I don't oppose the idea of making "instance" be the running state and "database cluster" the on-disk stuff that supports the instance. Here's a patch that does things pretty much along the lines you suggested. I made small adjustments to "SQL objects": * SQL objects in schemas were said to have their names unique in the schema, but we failed to say anything about names of objects not in schemas and global objects. Added that. * Had example object types for global objects and objects not in schemas, but no examples for objects in schemas. Added that. Some programs whose output we could tweak per this: pg_ctl pg_ctl is a utility to initialize, start, stop, or control a PostgreSQL server. -D, --pgdata=DATADIR location of the database storage area to: pg_ctl is a utility to initialize or control a PostgreSQL database cluster. -D, --pgdata=DATADIR location of the database directory pg_basebackup: pg_basebackup takes a base backup of a running PostgreSQL server. to: pg_basebackup takes a base backup of a PostgreSQL instance. +1, with two formal changes: - Rearrangement of term "Data page" to meet alphabetical order. - Add in one case to meet xml-well-formedness. One last question: The definition of "Data directory" reads "... A cluster's storage space comprises the data directory plus ..." and 'cluster' links to '"glossary-instance". Shouldn't it link to "glossary-db-cluster"? -- Jürgen Purtz diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml index e29b55e5ac..0499f9044f 100644 --- a/doc/src/sgml/glossary.sgml +++ b/doc/src/sgml/glossary.sgml @@ -413,6 +413,22 @@ + + Data page + + + The basic structure used to store relation data. + All pages are of the same size. + Data pages are typically stored on disk, each in a specific file, + and can be read to shared buffers + where they can be modified, becoming + dirty. They become clean when written + to disk. New pages, which initially exist in memory only, are also + dirty until written. + + + + Database @@ -441,6 +457,7 @@ cluster is also sometimes used to refer to an instance. (Don't confuse this term with the SQL command CLUSTER.) + @@ -448,22 +465,6 @@ - - Data page - - - The basic structure used to store relation data. - All pages are of the same size. - Data pages are typically stored on disk, each in a specific file, - and can be read to shared buffers - where they can be modified, becoming - dirty. They become clean when written - to disk. New pages, which initially exist in memory only, are also - dirty until written. - - - - Datum
Re: Add A Glossary
On Tue, Jun 16, 2020 at 08:09:26PM -0400, Alvaro Herrera wrote: > diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml > index 25b03f3b37..e29b55e5ac 100644 > --- a/doc/src/sgml/glossary.sgml > +++ b/doc/src/sgml/glossary.sgml > @@ -395,15 +395,15 @@ > > The base directory on the filesystem of a > server that contains > all > - data files and subdirectories associated with an > - instance (with the > - exception of linkend="glossary-tablespace">tablespaces). > + data files and subdirectories associated with a > + database cluster > + (with the exception of > + tablespaces). and (optionally) WAL > + > + Database cluster > + > + > + A collection of databases and global SQL objects, > + and their common static and dynamic meta-data. metadata > @@ -1245,12 +1255,17 @@ > SQL objects, > which all reside in the same > database. > - Each SQL object must reside in exactly one schema. > + Each SQL object must reside in exactly one schema > + (though certain types of SQL objects exist outside schemas). (except for global objects which ..) > > The names of SQL objects of the same type in the same schema are > enforced > to be unique. > There is no restriction on reusing a name in multiple schemas. > + For local objects that exist outside schemas, their names are enforced > + unique across the whole database. For global objects, their names I would say "unique within the database" > + are enforced unique across the whole > + database cluster. and "unique within the whole db cluster" >Most local objects belong to a specific > - schema in their > containing database. > + schema in their > + containing database, such as > + all types of > relations, > + all types of > functions, Maybe say: >Relations< (all types), and >Functions< (all types) > used as the default one for all SQL objects, called > pg_default. > > "the default" (remove "one") -- Justin
Re: Add A Glossary
On 2020-Jun-09, Jürgen Purtz wrote: > Can you agree to the following definitions? If no, we can alternatively > formulate for each of them: "Under discussion - currently not defined". My > proposals are inspired by chapter 2.2 Concepts: "Tables are grouped into > databases, and a collection of databases managed by a single PostgreSQL > server instance constitutes a database cluster." After sleeping on it a few more times, I don't oppose the idea of making "instance" be the running state and "database cluster" the on-disk stuff that supports the instance. Here's a patch that does things pretty much along the lines you suggested. I made small adjustments to "SQL objects": * SQL objects in schemas were said to have their names unique in the schema, but we failed to say anything about names of objects not in schemas and global objects. Added that. * Had example object types for global objects and objects not in schemas, but no examples for objects in schemas. Added that. Some programs whose output we could tweak per this: pg_ctl > pg_ctl is a utility to initialize, start, stop, or control a PostgreSQL > server. > -D, --pgdata=DATADIR location of the database storage area to: > pg_ctl is a utility to initialize or control a PostgreSQL database cluster. > -D, --pgdata=DATADIR location of the database directory pg_basebackup: > pg_basebackup takes a base backup of a running PostgreSQL server. to: > pg_basebackup takes a base backup of a PostgreSQL instance. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml index 25b03f3b37..e29b55e5ac 100644 --- a/doc/src/sgml/glossary.sgml +++ b/doc/src/sgml/glossary.sgml @@ -395,15 +395,15 @@ The base directory on the filesystem of a server that contains all - data files and subdirectories associated with an - instance (with the - exception of tablespaces). + data files and subdirectories associated with a + database cluster + (with the exception of + tablespaces). The environment variable PGDATA is commonly used to - refer to the - data directory. + refer to the data directory. - An instance's storage + A cluster's storage space comprises the data directory plus any additional tablespaces. @@ -418,7 +418,7 @@ A named collection of - SQL objects. + local SQL objects. For more information, see @@ -427,6 +427,22 @@ + + Database cluster + + + A collection of databases and global SQL objects, + and their common static and dynamic meta-data. + Sometimes referred to as a + cluster. + + + In PostgreSQL, the term + cluster is also sometimes used to refer to an instance. + (Don't confuse this term with the SQL command CLUSTER.) + + + Database server @@ -634,7 +650,7 @@ Function - Any defined transformation of data. Many functions are already defined + A defined transformation of data. Many functions are already defined within PostgreSQL itself, but user-defined ones can also be added. @@ -724,14 +740,12 @@ Instance - A set of databases and accompanying global SQL objects that are stored in - the same data directory - in a single server. - If running, one + A group of backend and auxiliary processes that communicate using + a common shared memory area. One postmaster process - manages a group of backend and auxiliary processes that communicate - using a common shared memory - area. Many instances can run on the same + manages the instance; one instance manages exactly one + database cluster + with all its databases. Many instances can run on the same server as long as their TCP ports do not conflict. @@ -739,14 +753,10 @@ The instance handles all key features of a DBMS: read and write access to files and shared memory, assurance of the ACID properties, - connections to client processes, + connections to + client processes, privilege verification, crash recovery, replication, etc. - - In PostgreSQL, the term - cluster is also sometimes used to refer to an instance. - (Don't confuse this term with the SQL command CLUSTER.) - @@ -1245,12 +1255,17 @@ SQL objects, which all reside in the same database. - Each SQL object must reside in exactly one schema. + Each SQL object must reside in exactly one schema + (though certain types of SQL objects exist outside schemas). The names of SQL objects of the same type in the same schema are enforced to be unique. There is no restriction on reusing a name in multiple schemas. + For local objects that
Re: Add A Glossary
On 17.05.20 17:28, Alvaro Herrera wrote: I think the terms under discussion are just * cluster * instance * server Despite the short period of its existence the glossary achieved some importance, see: https://www.postgresql.org/message-id/b8e12875ebec9e6d3107df5fa1129e1e%40postgrespro.ru . We have to be careful with publications. It's not acceptable that we change definitions from release to release. Therefore IMO we should mark or even ignore such terms for which we cannot reach consensus. Can you agree to the following definitions? If no, we can alternatively formulate for each of them: "Under discussion - currently not defined". My proposals are inspired by chapter 2.2 Concepts: "Tables are grouped into databases, and a collection of databases managed by a single PostgreSQL server instance constitutes a database cluster." - "Database" (No change to existing definition): "A named collection of SQL objects." - "Database Cluster", "Cluster" (New definition and rearrangements of some sentences): "A collection of related databases, and their common static and dynamic meta-data. This term is sometimes used to refer to an instance. (Don't confuse the term CLUSTER with the SQL command CLUSTER.)" - "Data Directory" (Replaced 'instance' by 'cluster'): "The base directory on the filesystem of a server that contains all data files and subdirectories associated with a cluster (with the exception of tablespaces). The environment variable PGDATA is commonly used to refer to the data directory. A cluster's storage space comprises the data directory plus any additional tablespaces. For more information, see Section 68.1." - "Database Server", "Instance" (Major changes): "A group of backend and auxiliary processes that communicate using a common shared memory area. One postmaster process manages the instance; one instance manages exactly one cluster with all its databases. Many instances can run on the same server as long as their TCP ports do not conflict. The instance handles all key features of a DBMS: read and write access to files and shared memory, assurance of the ACID properties, connections to client processes, privilege verification, crash recovery, replication, etc." - "Server" (No change to existing definition): "A computer on which PostgreSQL instances run. The term server denotes real hardware, a container, or a virtual machine. This term is sometimes used to refer to an instance or to a host." - "Host" (No change to existing definition): "A computer that communicates with other computers over a network. This is sometimes used as a synonym for server. It is also used to refer to a computer where client processes run." -- Jürgen Purtz
Re: Add A Glossary
On Wed, 2020-05-20 at 13:17 +0200, Jürgen Purtz wrote: > > FWIW, I feel somewhat like Alvaro on that point; I use those terms > > synonymously, > > perhaps distinguishing between a "started cluster" and a "stopped cluster". > > After all, "cluster" refers to "a cluster of databases", which are there, > > regardless > > if you start the server or not. > > > > The term "cluster" is unfortunate, because to most people it suggests a > > group of > > machines, so the term "instance" is better, but that ship has sailed long > > ago. > > > > The static part of a cluster to me is the "data directory". > > cluster/instance: The different nature (static/dynamic) of what I > call "cluster" and "instance" as well as the existence of the two > commands "initdb — create a new PostgreSQL database cluster" and > "pg_ctl — initialize, start, stop, or control a PostgreSQL server" > confirms me in my opinion that we need two different terms for > them. I think that the "pg_ctl" example does not apply: It does not talk about starting the cluster, but about starting the server process, that is "server" in the way I understand it. > There are situations where we need a single term for both of > them. "Instance and its data directory" or "Instance and its > cluster" are too wordy. In many cases we use "database server" or > "server" in this sense. Imo "Server" is too short and ambiguous. > "database server", the plural form "databases server", or the new > term "cluster server", which is more accurate, would be ok for me. > (Similar to "server", the term "cluster" is also used in many > different contexts - but only outside of the PG world; within our > context "cluster" is not ambiguous.) That does not feel right to me. "cluster server", ouch. "databases server", ouch as well. I never felt the term "cluster" was unclear in these contexts. Sometimes it means "data directory", sometimes it is used for "server process", but I think few people would think one cound connect to a data directory or create a process in a directory (initdb). I think clarity is a Good Thing, but it can be overdone. > > > server/host: We need a term to describe the underlying hardware > > > respectively > > > the virtual machine or container, where PG is running. I suggest to use > > > both > > > *server* and *host*. In computer science, both have their eligibility and > > > are > > > widely used. Everybody understands *client/server architecture* or *host* > > > in > > > TCP/IP configuration. We cannot change such matter of course. I suggest to > > > use both depending on the context, but with the same meaning: "real > > > hardware, > > > a container, or a virtual machine". > > > > On this I have a strong opinion because of my Unix mindset. > > "machine" and "host" are synonyms, and it doesn't matter to the database if > > they > > are virtualized or not. You can always disambiguate by adding "virtual" or > > "physical". > > > > A "server" is a piece of software that responds to client requests, never a > > machine. > > In my book, this is purely Windows jargon. The term "client-server > > architecture" > > that you quote emphasized that. > > > > Perhaps "machine" would be the preferable term, because "host" is more > > prone to > > misunderstandings (except in a networking context). > > server/host: I agree that we are not interested in the question > whether there is real hardware or any virtualization container. We > are even not interested in the operating system. Our primary > concern is the existence of a port of the Internet Protocol. But > is the term "server" appropriate to name an IP-port? Additionally, > "server" is used for other meanings: a) the previously mentioned > "database server" b) a (virtual) machine: "server-side", "... the > file ... loaded by the server ..." c) binaries "... the server > must be built with SSL support ..." d) whenever it seems to be > appropriate: "standby server", "... the server parses query ...", > "server configuration", "server process". You are most thorough :^) > Because of its ambiguous usage, the definition of "server" must > clarify the allowed meanings. What's about: > > server: Depending on the context, the term *server* denotes: > > An IP-port which is offered by any OS. ? A port is a server? No way. > A - possibly virtualized - machine It might be good to disambiguate that, but I don't think that the PostgreSQL documentation should use the word "server" to mean "machine". > An abbreviation for the slightly longer term > "database(s)/cluster server" ??? this will support the > readability, but not the clarity ??? "Server" is short for "database server" and is a set of processes that listen for and handle incoming database client requests. I think that covers all the
Re: Add A Glossary
On 19.05.20 08:17, Laurenz Albe wrote: On Mon, 2020-05-18 at 18:08 +0200, Jürgen Purtz wrote: cluster/instance: PG (mainly) consists of a group of processes that commonly act on shared buffers. The processes are very closely related to each other and with the buffers. They exist altogether or not at all. They use a common initialization file and are incarnated by one command. Everything exists solely in RAM and therefor has a fluctuating nature. In summary: they build a unit and this unit needs to have a name of itself. In some pages we used to use the term *instance* - sometimes in extended forms: *database instance*, *PG instance*, *standby instance*, *standby server instance*, *server instance*, or *remote instance*. For me, the term *instance* makes sense, the extensions *standby instance* and *remote instance* in their context too. FWIW, I feel somewhat like Alvaro on that point; I use those terms synonymously, perhaps distinguishing between a "started cluster" and a "stopped cluster". After all, "cluster" refers to "a cluster of databases", which are there, regardless if you start the server or not. The term "cluster" is unfortunate, because to most people it suggests a group of machines, so the term "instance" is better, but that ship has sailed long ago. The static part of a cluster to me is the "data directory". cluster/instance: The different nature (static/dynamic) of what I call "cluster" and "instance" as well as the existence of the two commands "initdb — create a new PostgreSQL database cluster" and "pg_ctl — initialize, start, stop, or control a PostgreSQL server" confirms me in my opinion that we need two different terms for them. Those two terms shall not be synonym to each other, they label distinct things. If people prefer "data directory" instead of "cluster", this is ok for me. There are situations where we need a single term for both of them. "Instance and its data directory" or "Instance and its cluster" are too wordy. In many cases we use "database server" or "server" in this sense. Imo "Server" is too short and ambiguous. "database server", the plural form "databases server", or the new term "cluster server", which is more accurate, would be ok for me. (Similar to "server", the term "cluster" is also used in many different contexts - but only outside of the PG world; within our context "cluster" is not ambiguous.) server/host: We need a term to describe the underlying hardware respectively the virtual machine or container, where PG is running. I suggest to use both *server* and *host*. In computer science, both have their eligibility and are widely used. Everybody understands *client/server architecture* or *host* in TCP/IP configuration. We cannot change such matter of course. I suggest to use both depending on the context, but with the same meaning: "real hardware, a container, or a virtual machine". On this I have a strong opinion because of my Unix mindset. "machine" and "host" are synonyms, and it doesn't matter to the database if they are virtualized or not. You can always disambiguate by adding "virtual" or "physical". A "server" is a piece of software that responds to client requests, never a machine. In my book, this is purely Windows jargon. The term "client-server architecture" that you quote emphasized that. Perhaps "machine" would be the preferable term, because "host" is more prone to misunderstandings (except in a networking context). server/host: I agree that we are not interested in the question whether there is real hardware or any virtualization container. We are even not interested in the operating system. Our primary concern is the existence of a port of the Internet Protocol. But is the term "server" appropriate to name an IP-port? Additionally, "server" is used for other meanings: a) the previously mentioned "database server" b) a (virtual) machine: "server-side", "... the file ... loaded by the server ..." c) binaries "... the server must be built with SSL support ..." d) whenever it seems to be appropriate: "standby server", "... the server parses query ...", "server configuration", "server process". Because of its ambiguous usage, the definition of "server" must clarify the allowed meanings. What's about: -- server: Depending on the context, the term *server* denotes: * An IP-port which is offered by any OS. ? * A - possibly virtualized - machine * An abbreviation for the slightly longer term "database(s)/cluster server" ??? this will support the readability, but not the clarity ??? * More ? -- The term "host" is used mainly for IP configuration "host name", "host address" and in the context of compiling "host language", "host variable". These are clear situations and can be defined easily.
Re: Add A Glossary
On 2020-05-19 08:17, Laurenz Albe wrote: The term "cluster" is unfortunate, because to most people it suggests a group of machines, so the term "instance" is better, but that ship has sailed long ago. I don't see what would stop us from renaming some things, with some care. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Add A Glossary
I think there needs to be a careful analysis of the language and a formal effort to stabilise it for the future. In the context of, say, an Oracle T series, which is partitioned into multiple domains (virtual machines) in it, each of these has multiple CPUs, and can run an instance of the OS which hosts multiple virtual instances of the same or different OSes. Som domains might do this while others do not! A host could be a domain, one of many virtual machines, or it could be one of many hosts on that VM but even these hosts could be virtual machines that each runs several virtual servers! Of course, PostgreSQL can run on any tier of this regime, but the documentation at least needs to be consistent about language. A "machine" should probably refer to hardware, although I would accept that a domain might count as "virtual hardware" while a host should probably refer to a single instance of OS. Of course it is possible for a single instance of OS to run multiple instances of PostgreSQL, and people do this. (I have in the past). Slightly more confusingly, it would appear possible for a single instance of an OS to have multiple IP addresses and if there are multiple instances of PostgreSQL, they may serve different IP Addresses uniquely, or share them. I think this case suggests that a host probably best describes an OS instance. I might be wrong. The word "server" might be an instance of any of the above, or a waiter with a bowl of soup. It is best reserved for situations where clarity is not required. If you are new to all this, I am sure it is very confusing, and inconsistent language is not going to help. Andrew AFAICT On Tue, 19 May 2020 at 07:17, Laurenz Albe wrote: > On Mon, 2020-05-18 at 18:08 +0200, Jürgen Purtz wrote: > > cluster/instance: PG (mainly) consists of a group of processes that > commonly > > act on shared buffers. The processes are very closely related to each > other > > and with the buffers. They exist altogether or not at all. They use a > common > > initialization file and are incarnated by one command. Everything exists > > solely in RAM and therefor has a fluctuating nature. In summary: they > build > > a unit and this unit needs to have a name of itself. In some pages we > used > > to use the term *instance* - sometimes in extended forms: *database > instance*, > > *PG instance*, *standby instance*, *standby server instance*, *server > instance*, > > or *remote instance*. For me, the term *instance* makes sense, the > extensions > > *standby instance* and *remote instance* in their context too. > > FWIW, I feel somewhat like Alvaro on that point; I use those terms > synonymously, > perhaps distinguishing between a "started cluster" and a "stopped cluster". > After all, "cluster" refers to "a cluster of databases", which are there, > regardless > if you start the server or not. > > The term "cluster" is unfortunate, because to most people it suggests a > group of > machines, so the term "instance" is better, but that ship has sailed long > ago. > > The static part of a cluster to me is the "data directory". > > > server/host: We need a term to describe the underlying hardware > respectively > > the virtual machine or container, where PG is running. I suggest to use > both > > *server* and *host*. In computer science, both have their eligibility > and are > > widely used. Everybody understands *client/server architecture* or > *host* in > > TCP/IP configuration. We cannot change such matter of course. I suggest > to > > use both depending on the context, but with the same meaning: "real > hardware, > > a container, or a virtual machine". > > On this I have a strong opinion because of my Unix mindset. > "machine" and "host" are synonyms, and it doesn't matter to the database > if they > are virtualized or not. You can always disambiguate by adding "virtual" > or "physical". > > A "server" is a piece of software that responds to client requests, never > a machine. > In my book, this is purely Windows jargon. The term "client-server > architecture" > that you quote emphasized that. > > Perhaps "machine" would be the preferable term, because "host" is more > prone to > misunderstandings (except in a networking context). > > Yours, > Laurenz Albe > > > >
Re: Add A Glossary
On Mon, 2020-05-18 at 18:08 +0200, Jürgen Purtz wrote: > cluster/instance: PG (mainly) consists of a group of processes that commonly > act on shared buffers. The processes are very closely related to each other > and with the buffers. They exist altogether or not at all. They use a common > initialization file and are incarnated by one command. Everything exists > solely in RAM and therefor has a fluctuating nature. In summary: they build > a unit and this unit needs to have a name of itself. In some pages we used > to use the term *instance* - sometimes in extended forms: *database instance*, > *PG instance*, *standby instance*, *standby server instance*, *server > instance*, > or *remote instance*. For me, the term *instance* makes sense, the extensions > *standby instance* and *remote instance* in their context too. FWIW, I feel somewhat like Alvaro on that point; I use those terms synonymously, perhaps distinguishing between a "started cluster" and a "stopped cluster". After all, "cluster" refers to "a cluster of databases", which are there, regardless if you start the server or not. The term "cluster" is unfortunate, because to most people it suggests a group of machines, so the term "instance" is better, but that ship has sailed long ago. The static part of a cluster to me is the "data directory". > server/host: We need a term to describe the underlying hardware respectively > the virtual machine or container, where PG is running. I suggest to use both > *server* and *host*. In computer science, both have their eligibility and are > widely used. Everybody understands *client/server architecture* or *host* in > TCP/IP configuration. We cannot change such matter of course. I suggest to > use both depending on the context, but with the same meaning: "real hardware, > a container, or a virtual machine". On this I have a strong opinion because of my Unix mindset. "machine" and "host" are synonyms, and it doesn't matter to the database if they are virtualized or not. You can always disambiguate by adding "virtual" or "physical". A "server" is a piece of software that responds to client requests, never a machine. In my book, this is purely Windows jargon. The term "client-server architecture" that you quote emphasized that. Perhaps "machine" would be the preferable term, because "host" is more prone to misunderstandings (except in a networking context). Yours, Laurenz Albe
Re: Add A Glossary
On 17.05.20 17:28, Alvaro Herrera wrote: On 2020-May-17, Erik Rijkers wrote: On 2020-05-17 08:51, Alvaro Herrera wrote: I don't think that's the general understanding of those terms. For all I know, they*are* synonyms, and there's no specific term for "the fluctuating objects" as you call them. The instance is either running (in which case there are processes and RAM) or it isn't. For what it's worth, I've also always understood 'instance' as 'a running database'. I admit it might be a left-over from my oracle years: https://docs.oracle.com/cd/E11882_01/server.112/e40540/startup.htm#CNCPT601 There, 'instance' clearly refers to a running database. When that database is stopped, it ceases to be an instance. I've never understood it that way, but I'm open to having my opinion on it changed. So let's discuss it and maybe gather opinions from others. I think the terms under discussion are just * cluster * instance * server We don't have "host" (I just made it a synonym for server), but perhaps we can add that too, if it's useful. It would be good to be consistent with historical Postgres usage, such as the initdb usage of "cluster" etc. Perhaps we should not only define what our use of each term is, but also explain how each term is used outside PostgreSQL and highlight the differences. (This would be particularly useful for "cluster" ISTM.) In fact, we have reached a point where we don't have a common understanding of a group of terms. I'm sure that we will meet some more situations like this in the future. Such discussions, subsequent decisions, and implementations in the docs are necessary to gain a solid foundation - primarily for newcomers (what is my first motivation) as well as for more complex discussions among experts. Obviously, each of us will include his previous understanding of terms. But we also should be open to sometimes revise old terms. Here are my two cents. cluster/instance: PG (mainly) consists of a group of processes that commonly act on shared buffers. The processes are very closely related to each other and with the buffers. They exist altogether or not at all. They use a common initialization file and are incarnated by one command. Everything exists solely in RAM and therefor has a fluctuating nature. In summary: they build a unit and this unit needs to have a name of itself. In some pages we used to use the term *instance* - sometimes in extended forms: *database instance*, *PG instance*, *standby instance*, *standby server instance*, *server instance*, or *remote instance*. For me, the term *instance* makes sense, the extensions *standby instance* and *remote instance* in their context too. The next essential component is the data itself. It is organized as a group of databases plus some common management information (global, pg_wal, pg_xact, pg_tblspc, ...). The complete data must be treated as a whole because the management information concerns all databases. Its nature is different from the processes and shared buffers. Of course, its content changes, but it has a steady nature. It even survives a 'power down'. There is one command to instantiate a new incarnation of the directory structure and all files. In summary, it's something of its own and should have its own name. 'database' is not possible because it consists of databases and other things. My favorite is *cluster*; *database cluster* is also possible. server/host: We need a term to describe the underlying hardware respectively the virtual machine or container, where PG is running. I suggest to use both *server* and *host*. In computer science, both have their eligibility and are widely used. Everybody understands *client/server architecture* or *host* in TCP/IP configuration. We cannot change such matter of course. I suggest to use both depending on the context, but with the same meaning: "real hardware, a container, or a virtual machine". -- Jürgen Purtz (PS: I added the docs mailing list)