Re: Add A Glossary

2020-07-21 Thread Alvaro Herrera
On 2020-Jul-21, Jürgen Purtz wrote:

> - Added '(process)' to the two terms 'Autovacuum' and 'Stats Collector'
> 
> - Removed link to himself in 'Logger (process)'
> 
> - new term: Base Backup

Pushed.  I was not courageous enough to include "base backup" in 13, so
that one's in master only, but the other ones are in both branches.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Add A Glossary

2020-07-21 Thread Jürgen Purtz

On 19.06.20 19:10, Alvaro Herrera wrote:

Thanks for these fixes!  I included all of these.

On 2020-Jun-19, Erik Rijkers wrote:


And one thing that I am not sure of (but strikes me as a bit odd):
there are several cases of
'are enforced unique'. Should that not be
'are enforced to be unique'  ?

I included this change too; I am not too sure of it myself.  If some
English language neatnik wants to argue one way or the other, be my
guest.


- Added '(process)' to the two terms 'Autovacuum' and 'Stats Collector'

- Removed link to himself in 'Logger (process)'

- new term: Base Backup


--

Jürgen Purtz


diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
index 76525c6302..58e5071642 100644
--- a/doc/src/sgml/glossary.sgml
+++ b/doc/src/sgml/glossary.sgml
@@ -108,7 +108,7 @@
   
 
   
-   Autovacuum
+   Autovacuum (process)

 
  A set of background processes that routinely perform
@@ -178,6 +178,19 @@

   
 
+  
+   Base Backup
+   
+
+ A binary copy of all
+ database cluster
+ files. It is generated by the tool .  
+ In combination with WAL files it can be used as the starting point
+ for recovery, log shipping, or streaming replication.
+
+   
+  
+
   
Bloat

@@ -855,8 +868,7 @@
Logger (process)

 
- If activated, the
- Logger process
+ If activated, the process
  writes information about database events into the current
  log file.
  When reaching certain time- or
@@ -1486,7 +1498,7 @@
   
 
   
-   Stats collector
+   Stats collector (process)

 
  This process collects statistical information about the


Re: Add A Glossary

2020-06-19 Thread Alvaro Herrera
Thanks for these fixes!  I included all of these.

On 2020-Jun-19, Erik Rijkers wrote:

> And one thing that I am not sure of (but strikes me as a bit odd):
> there are several cases of
> 'are enforced unique'. Should that not be
> 'are enforced to be unique'  ?

I included this change too; I am not too sure of it myself.  If some
English language neatnik wants to argue one way or the other, be my
guest.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Add A Glossary

2020-06-19 Thread Erik Rijkers

On 2020-06-19 01:51, Alvaro Herrera wrote:

On 2020-Jun-16, Justin Pryzby wrote:

On Tue, Jun 16, 2020 at 08:09:26PM -0400, Alvaro Herrera wrote:


I noticed one typo:

'aggregates functions'  should be
'aggregate functions'


And one thing that I am not sure of (but strikes me as a bit odd):
there are several cases of
'are enforced unique'. Should that not be
'are enforced to be unique'  ?


Anther small mistake (2x):

'The name of such objects of the same type are'  should be
'The names of such objects of the same type are'

(this phrase occurs 2x wrong, 1x correct)


thanks,

Erik Rijkers

















Re: Add A Glossary

2020-06-18 Thread Alvaro Herrera
On 2020-Jun-16, Justin Pryzby wrote:
> On Tue, Jun 16, 2020 at 08:09:26PM -0400, Alvaro Herrera wrote:

Thanks for the review.  I merged all your suggestions.  This one:

> >Most local objects belong to a specific
> > +  schema in their
> > +  containing database, such as
> > +  all types of 
> > relations,
> > +  all types of 
> > functions,
> 
> Maybe say: >Relations< (all types), and >Functions< (all types)

led me down not one but two rabbit holes; first I realized that
"functions" is an insufficient term since procedures should also be
included but weren't, so I had to add the more generic term "routine"
and then modify the definitions of all routine types to mix in well.  I
think overall the quality of these definitions is improved as a result.

I also felt the need to revise the definition of "relations", so I did
that too; this made me change the definition of resultset too.

On 2020-Jun-17, Jürgen Purtz wrote:

> +1, with two formal changes:
> 
> -  Rearrangement of term "Data page" to meet alphabetical order.

To forestall these ordering issues (look, another rabbit hole), I
grepped the file for all glossterms and sorted that under en_US rules,
then reordered the terms to match that.  Turns out there were several
other ordering mistakes.

git grep ''  | sed -e 's/<[^>]*>\([^<]*\)<[^>]*>/\1/' > orig
LC_COLLATE=en_US.UTF-8 sort orig > sorted

(Eliminating the tags is important, otherwise the sort uses the tags
themselves to disambiguate)

> One last question: The definition of "Data directory" reads "... A cluster's
> storage space comprises the data directory plus ..." and 'cluster' links to
> '"glossary-instance". Shouldn't it link to "glossary-db-cluster"?

Yes, an oversight, thanks.

I also added TPS, because I had already written it.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
index 25b03f3b37..5274feabba 100644
--- a/doc/src/sgml/glossary.sgml
+++ b/doc/src/sgml/glossary.sgml
@@ -23,7 +23,7 @@
   
 
   
-   Aggregate function
+   Aggregate function (routine)

 
  A function that
@@ -39,6 +39,11 @@

   
 
+  
+   Analytic function
+   
+  
+
   
Analyze (operation)

@@ -57,11 +62,6 @@

   
 
-  
-   Analytic function
-   
-  
-
   
Atomic

@@ -389,40 +389,33 @@

   
 
-  
-   Data directory
+  
+   Database

 
- The base directory on the filesystem of a
- server that contains all
- data files and subdirectories associated with an
- instance (with the
- exception of tablespaces).
- The environment variable PGDATA is commonly used to
- refer to the
- data directory.
-
-
- An instance's storage
- space comprises the data directory plus any additional tablespaces.
+ A named collection of
+ local SQL objects.
 
 
  For more information, see
- .
+ .
 

   
 
-  
-   Database
+  
+   Database cluster

 
- A named collection of
- SQL objects.
+ A collection of databases and global SQL objects,
+ and their common static and dynamic metadata.
+ Sometimes referred to as a
+ cluster.
 
 
- For more information, see
- .
+ In PostgreSQL, the term
+ cluster is also sometimes used to refer to an instance.
+ (Don't confuse this term with the SQL command CLUSTER.)
 

   
@@ -432,6 +425,31 @@

   
 
+  
+   Data directory
+   
+
+ The base directory on the filesystem of a
+ server that contains all
+ data files and subdirectories associated with a
+ database cluster
+ (with the exception of
+ tablespaces,
+ and optionally WAL).
+ The environment variable PGDATA is commonly used to
+ refer to the data directory.
+
+
+ A cluster's storage
+ space comprises the data directory plus any additional tablespaces.
+
+
+ For more information, see
+ .
+
+   
+  
+
   
Data page

@@ -578,7 +596,7 @@
   
 
   
-   Foreign table
+   Foreign table (relation)

 
  A relation which appears to have
@@ -631,12 +649,20 @@
   
 
   
-   Function
+   Function (routine)

 
- Any defined transformation of data. Many functions are already defined
- within PostgreSQL itself, but user-defined
- ones can also be added.
+ A type of routine that receives zero or more arguments, returns zero or more
+ output values, and is constrained to run within one transaction.
+ Functions are invoked as part of a query, for example via
+ SELECT.
+ Certain functions can return
+ sets; those are
+ called set-returning functions.
+
+
+ Functions can also be used for
+ triggers to invoke.
 
 
  For more information, see
@@ -689,13 +715,12 @@
   
 
   
-   Index
+   Index (relation)

 
  A relation 

Re: Add A Glossary

2020-06-17 Thread Jürgen Purtz


On 17.06.20 02:09, Alvaro Herrera wrote:

On 2020-Jun-09, Jürgen Purtz wrote:


Can you agree to the following definitions? If no, we can alternatively
formulate for each of them: "Under discussion - currently not defined". My
proposals are inspired by chapter 2.2 Concepts: "Tables are grouped into
databases, and a collection of databases managed by a single PostgreSQL
server instance constitutes a database cluster."

After sleeping on it a few more times, I don't oppose the idea of making
"instance" be the running state and "database cluster" the on-disk stuff
that supports the instance.  Here's a patch that does things pretty much
along the lines you suggested.

I made small adjustments to "SQL objects":

* SQL objects in schemas were said to have their names unique in the
schema, but we failed to say anything about names of objects not in
schemas and global objects.  Added that.

* Had example object types for global objects and objects not in
schemas, but no examples for objects in schemas.  Added that.


Some programs whose output we could tweak per this:
pg_ctl

pg_ctl is a utility to initialize, start, stop, or control a PostgreSQL server.
  -D, --pgdata=DATADIR   location of the database storage area

to:

pg_ctl is a utility to initialize or control a PostgreSQL database cluster.
  -D, --pgdata=DATADIR   location of the database directory

pg_basebackup:

pg_basebackup takes a base backup of a running PostgreSQL server.

to:

pg_basebackup takes a base backup of a PostgreSQL instance.


+1, with two formal changes:

-  Rearrangement of term "Data page" to meet alphabetical order.

-  Add  in one case to meet xml-well-formedness.


One last question: The definition of "Data directory" reads "... A 
cluster's storage space comprises the data directory plus ..." and 
'cluster' links to '"glossary-instance". Shouldn't it link to 
"glossary-db-cluster"?


--

Jürgen Purtz


diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
index e29b55e5ac..0499f9044f 100644
--- a/doc/src/sgml/glossary.sgml
+++ b/doc/src/sgml/glossary.sgml
@@ -413,6 +413,22 @@

   
 
+  
+   Data page
+   
+
+ The basic structure used to store relation data.
+ All pages are of the same size.
+ Data pages are typically stored on disk, each in a specific file,
+ and can be read to shared buffers
+ where they can be modified, becoming
+ dirty.  They become clean when written
+ to disk.  New pages, which initially exist in memory only, are also
+ dirty until written.
+
+   
+  
+
   
Database

@@ -441,6 +457,7 @@
  cluster is also sometimes used to refer to an instance.
  (Don't confuse this term with the SQL command CLUSTER.)
 
+   
   
 
   
@@ -448,22 +465,6 @@

   
 
-  
-   Data page
-   
-
- The basic structure used to store relation data.
- All pages are of the same size.
- Data pages are typically stored on disk, each in a specific file,
- and can be read to shared buffers
- where they can be modified, becoming
- dirty.  They become clean when written
- to disk.  New pages, which initially exist in memory only, are also
- dirty until written.
-
-   
-  
-
   
Datum



Re: Add A Glossary

2020-06-16 Thread Justin Pryzby
On Tue, Jun 16, 2020 at 08:09:26PM -0400, Alvaro Herrera wrote:
> diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
> index 25b03f3b37..e29b55e5ac 100644
> --- a/doc/src/sgml/glossary.sgml
> +++ b/doc/src/sgml/glossary.sgml
> @@ -395,15 +395,15 @@
>  
>   The base directory on the filesystem of a
>   server that contains 
> all
> - data files and subdirectories associated with an
> - instance (with the
> - exception of  linkend="glossary-tablespace">tablespaces).
> + data files and subdirectories associated with a
> + database cluster
> + (with the exception of
> + tablespaces).

and (optionally) WAL

> +  
> +   Database cluster
> +   
> +
> + A collection of databases and global SQL objects,
> + and their common static and dynamic meta-data.

metadata

> @@ -1245,12 +1255,17 @@
>   SQL objects,
>   which all reside in the same
>   database.
> - Each SQL object must reside in exactly one schema.
> + Each SQL object must reside in exactly one schema
> + (though certain types of SQL objects exist outside schemas).

(except for global objects which ..)

>  
>   The names of SQL objects of the same type in the same schema are 
> enforced
>   to be unique.
>   There is no restriction on reusing a name in multiple schemas.
> + For local objects that exist outside schemas, their names are enforced
> + unique across the whole database.  For global objects, their names

I would say "unique within the database"

> + are enforced unique across the whole
> + database cluster.

and "unique within the whole db cluster"

>Most local objects belong to a specific
> -  schema in their 
> containing database.
> +  schema in their
> +  containing database, such as
> +  all types of 
> relations,
> +  all types of 
> functions,

Maybe say: >Relations< (all types), and >Functions< (all types)

>   used as the default one for all SQL objects, called 
> pg_default.
>   
>  
"the default" (remove "one")

-- 
Justin




Re: Add A Glossary

2020-06-16 Thread Alvaro Herrera
On 2020-Jun-09, Jürgen Purtz wrote:

> Can you agree to the following definitions? If no, we can alternatively
> formulate for each of them: "Under discussion - currently not defined". My
> proposals are inspired by chapter 2.2 Concepts: "Tables are grouped into
> databases, and a collection of databases managed by a single PostgreSQL
> server instance constitutes a database cluster."

After sleeping on it a few more times, I don't oppose the idea of making
"instance" be the running state and "database cluster" the on-disk stuff
that supports the instance.  Here's a patch that does things pretty much
along the lines you suggested.

I made small adjustments to "SQL objects":

* SQL objects in schemas were said to have their names unique in the
schema, but we failed to say anything about names of objects not in
schemas and global objects.  Added that.

* Had example object types for global objects and objects not in
schemas, but no examples for objects in schemas.  Added that.


Some programs whose output we could tweak per this:
pg_ctl
> pg_ctl is a utility to initialize, start, stop, or control a PostgreSQL 
> server.
>  -D, --pgdata=DATADIR   location of the database storage area
to:
> pg_ctl is a utility to initialize or control a PostgreSQL database cluster.
>  -D, --pgdata=DATADIR   location of the database directory

pg_basebackup:
> pg_basebackup takes a base backup of a running PostgreSQL server.
to:
> pg_basebackup takes a base backup of a PostgreSQL instance.


-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
index 25b03f3b37..e29b55e5ac 100644
--- a/doc/src/sgml/glossary.sgml
+++ b/doc/src/sgml/glossary.sgml
@@ -395,15 +395,15 @@
 
  The base directory on the filesystem of a
  server that contains all
- data files and subdirectories associated with an
- instance (with the
- exception of tablespaces).
+ data files and subdirectories associated with a
+ database cluster
+ (with the exception of
+ tablespaces).
  The environment variable PGDATA is commonly used to
- refer to the
- data directory.
+ refer to the data directory.
 
 
- An instance's storage
+ A cluster's storage
  space comprises the data directory plus any additional tablespaces.
 
 
@@ -418,7 +418,7 @@

 
  A named collection of
- SQL objects.
+ local SQL objects.
 
 
  For more information, see
@@ -427,6 +427,22 @@

   
 
+  
+   Database cluster
+   
+
+ A collection of databases and global SQL objects,
+ and their common static and dynamic meta-data.
+ Sometimes referred to as a
+ cluster.
+
+
+ In PostgreSQL, the term
+ cluster is also sometimes used to refer to an instance.
+ (Don't confuse this term with the SQL command CLUSTER.)
+
+  
+
   
Database server

@@ -634,7 +650,7 @@
Function

 
- Any defined transformation of data. Many functions are already defined
+ A defined transformation of data.  Many functions are already defined
  within PostgreSQL itself, but user-defined
  ones can also be added.
 
@@ -724,14 +740,12 @@
Instance

 
- A set of databases and accompanying global SQL objects that are stored in
- the same data directory
- in a single server.
- If running, one
+ A group of backend and auxiliary processes that communicate using
+ a common shared memory area.  One 
  postmaster process
- manages a group of backend and auxiliary processes that communicate
- using a common shared memory
- area.  Many instances can run on the same
+ manages the instance; one instance manages exactly one
+ database cluster
+ with all its databases.  Many instances can run on the same
  server
  as long as their TCP ports do not conflict.
 
@@ -739,14 +753,10 @@
  The instance handles all key features of a DBMS:
  read and write access to files and shared memory,
  assurance of the ACID properties,
- connections to client processes,
+ connections to
+ client processes,
  privilege verification, crash recovery, replication, etc.
 
-
- In PostgreSQL, the term
- cluster is also sometimes used to refer to an instance.
- (Don't confuse this term with the SQL command CLUSTER.)
-

   
 
@@ -1245,12 +1255,17 @@
  SQL objects,
  which all reside in the same
  database.
- Each SQL object must reside in exactly one schema.
+ Each SQL object must reside in exactly one schema
+ (though certain types of SQL objects exist outside schemas).
 
 
  The names of SQL objects of the same type in the same schema are enforced
  to be unique.
  There is no restriction on reusing a name in multiple schemas.
+ For local objects that 

Re: Add A Glossary

2020-06-09 Thread Jürgen Purtz

On 17.05.20 17:28, Alvaro Herrera wrote:

I think the terms under discussion are just

* cluster
* instance
* server



Despite the short period of its existence the glossary achieved some 
importance, see: 
https://www.postgresql.org/message-id/b8e12875ebec9e6d3107df5fa1129e1e%40postgrespro.ru 
. We have to be careful with publications. It's not acceptable that we 
change definitions from release to release. Therefore IMO we should mark 
or even ignore such terms for which we cannot reach consensus.


Can you agree to the following definitions? If no, we can alternatively 
formulate for each of them: "Under discussion - currently not defined". 
My proposals are inspired by chapter 2.2 Concepts: "Tables are grouped 
into databases, and a collection of databases managed by a single 
PostgreSQL server instance constitutes a database cluster."



- "Database" (No change to existing definition): "A named collection of 
SQL objects."



- "Database Cluster", "Cluster" (New definition and rearrangements of 
some sentences): "A collection of related databases, and their common 
static and dynamic meta-data.


This term is sometimes used to refer to an instance.

(Don't confuse the term CLUSTER with the SQL command CLUSTER.)"


- "Data Directory" (Replaced 'instance' by 'cluster'): "The base 
directory on the filesystem of a server that contains all data files and 
subdirectories associated with a cluster (with the exception of 
tablespaces). The environment variable PGDATA is commonly used to refer 
to the data directory.


A cluster's storage space comprises the data directory plus any 
additional tablespaces.


For more information, see Section 68.1."


- "Database Server", "Instance" (Major changes): "A group of backend and 
auxiliary processes that communicate using a common shared memory area. 
One postmaster process manages the instance; one instance manages 
exactly one cluster with all its databases. Many instances can run on 
the same server as long as their TCP ports do not conflict.


The instance handles all key features of a DBMS: read and write access 
to files and shared memory, assurance of the ACID properties, 
connections to client processes, privilege verification, crash recovery, 
replication, etc."



- "Server" (No change to existing definition): "A computer on which 
PostgreSQL instances run. The term server denotes real hardware, a 
container, or a virtual machine.


This term is sometimes used to refer to an instance or to a host."


- "Host" (No change to existing definition): "A computer that 
communicates with other computers over a network. This is sometimes used 
as a synonym for server. It is also used to refer to a computer where 
client processes run."



--

Jürgen Purtz






Re: Add A Glossary

2020-05-20 Thread Laurenz Albe
On Wed, 2020-05-20 at 13:17 +0200, Jürgen Purtz wrote:
> > FWIW, I feel somewhat like Alvaro on that point; I use those terms 
> > synonymously,
> > perhaps distinguishing between a "started cluster" and a "stopped cluster".
> > After all, "cluster" refers to "a cluster of databases", which are there, 
> > regardless
> > if you start the server or not.
> > 
> > The term "cluster" is unfortunate, because to most people it suggests a 
> > group of
> > machines, so the term "instance" is better, but that ship has sailed long 
> > ago.
> > 
> > The static part of a cluster to me is the "data directory".
>   
> cluster/instance: The different nature (static/dynamic) of what I
>   call "cluster" and "instance" as well as the existence of the two
>   commands "initdb — create a new PostgreSQL database cluster" and 
>   "pg_ctl — initialize, start, stop, or control a PostgreSQL server"
>   confirms me in my opinion that we need two different terms for
>   them.

I think that the "pg_ctl" example does not apply:
It does not talk about starting the cluster, but about starting the server 
process,
that is "server" in the way I understand it.

> There are situations where we need a single term for both of
>   them. "Instance and its data directory" or "Instance and its
>   cluster" are too wordy. In many cases we use "database server" or
>   "server" in this sense. Imo "Server" is too short and ambiguous.
>   "database server", the plural form "databases server", or the new
>   term "cluster server", which is more accurate, would be ok for me.
>   (Similar to "server", the term "cluster" is also used in many
>   different contexts - but only outside of the PG world; within our
>   context "cluster" is not ambiguous.) 

That does not feel right to me.

"cluster server", ouch. "databases server", ouch as well.

I never felt the term "cluster" was unclear in these contexts.
Sometimes it means "data directory", sometimes it is used for "server process",
but I think few people would think one cound connect to a data directory
or create a process in a directory (initdb).

I think clarity is a Good Thing, but it can be overdone.

> > > server/host: We need a term to describe the underlying hardware 
> > > respectively
> > > the virtual machine or container, where PG is running. I suggest to use 
> > > both
> > > *server* and *host*. In computer science, both have their eligibility and 
> > > are
> > > widely used. Everybody understands *client/server architecture* or *host* 
> > > in
> > > TCP/IP configuration. We cannot change such matter of course. I suggest to
> > > use both depending on the context, but with the same meaning: "real 
> > > hardware,
> > > a container, or a virtual machine".
> > 
> > On this I have a strong opinion because of my Unix mindset.
> > "machine" and "host" are synonyms, and it doesn't matter to the database if 
> > they
> > are virtualized or not.  You can always disambiguate by adding "virtual" or 
> > "physical".
> > 
> > A "server" is a piece of software that responds to client requests, never a 
> > machine.
> > In my book, this is purely Windows jargon.  The term "client-server 
> > architecture"
> > that you quote emphasized that.
> > 
> > Perhaps "machine" would be the preferable term, because "host" is more 
> > prone to
> > misunderstandings (except in a networking context).
> 
> server/host: I agree that we are not interested in the question
>   whether there is real hardware or any virtualization container. We
>   are even not interested in the operating system. Our primary
>   concern is the existence of a port of the Internet Protocol. But
>   is the term "server" appropriate to name an IP-port? Additionally,
>   "server" is used for other meanings: a) the previously mentioned
>   "database server" b) a (virtual) machine: "server-side", "... the
>   file ... loaded by the server ..." c) binaries "... the server
>   must be built with SSL support ..." d) whenever it seems to be
>   appropriate: "standby server", "... the server parses query ...",
>   "server configuration", "server process".

You are most thorough :^)
   
> Because of its ambiguous usage, the definition of "server" must
>   clarify the allowed meanings. What's about:
> 
> server: Depending on the context, the term *server* denotes:
>   
> An IP-port which is offered by any OS.   ?

A port is a server?  No way.
  
> A - possibly virtualized - machine

It might be good to disambiguate that, but I don't think that the PostgreSQL
documentation should use the word "server" to mean "machine".

> An abbreviation for the slightly longer term
> "database(s)/cluster server"  ??? this will support the
> readability, but not the clarity ???

"Server" is short for "database server" and is a set of processes that listen
for and handle incoming database client requests.

I think that covers all the 

Re: Add A Glossary

2020-05-20 Thread Jürgen Purtz

On 19.05.20 08:17, Laurenz Albe wrote:

On Mon, 2020-05-18 at 18:08 +0200, Jürgen Purtz wrote:

cluster/instance: PG (mainly) consists of a group of processes that commonly
act on shared buffers. The processes are very closely related to each other
and with the buffers. They exist altogether or not at all. They use a common
initialization file and are incarnated by one command. Everything exists
solely in RAM and therefor has a fluctuating nature. In summary: they build
a unit and this unit needs to have a name of itself. In some pages we used
to use the term *instance* - sometimes in extended forms: *database instance*,
*PG instance*, *standby instance*, *standby server instance*, *server instance*,
or *remote instance*.  For me, the term *instance* makes sense, the extensions
*standby instance* and *remote instance* in their context too.

FWIW, I feel somewhat like Alvaro on that point; I use those terms synonymously,
perhaps distinguishing between a "started cluster" and a "stopped cluster".
After all, "cluster" refers to "a cluster of databases", which are there, 
regardless
if you start the server or not.

The term "cluster" is unfortunate, because to most people it suggests a group of
machines, so the term "instance" is better, but that ship has sailed long ago.

The static part of a cluster to me is the "data directory".


cluster/instance: The different nature (static/dynamic) of what I call 
"cluster" and "instance" as well as the existence of the two commands 
"initdb — create a new PostgreSQL database cluster" and "pg_ctl — 
initialize, start, stop, or control a PostgreSQL server" confirms me in 
my opinion that we need two different terms for them. Those two terms 
shall not be synonym to each other, they label distinct things. If 
people prefer "data directory" instead of "cluster", this is ok for me.


There are situations where we need a single term for both of them. 
"Instance and its data directory" or "Instance and its cluster" are too 
wordy. In many cases we use "database server" or "server" in this sense. 
Imo "Server" is too short and ambiguous. "database server", the plural 
form "databases server", or the new term "cluster server", which is more 
accurate, would be ok for me. (Similar to "server", the term "cluster" 
is also used in many different contexts - but only outside of the PG 
world; within our context "cluster" is not ambiguous.)



server/host: We need a term to describe the underlying hardware respectively
the virtual machine or container, where PG is running. I suggest to use both
*server* and *host*. In computer science, both have their eligibility and are
widely used. Everybody understands *client/server architecture* or *host* in
TCP/IP configuration. We cannot change such matter of course. I suggest to
use both depending on the context, but with the same meaning: "real hardware,
a container, or a virtual machine".

On this I have a strong opinion because of my Unix mindset.
"machine" and "host" are synonyms, and it doesn't matter to the database if they
are virtualized or not.  You can always disambiguate by adding "virtual" or 
"physical".

A "server" is a piece of software that responds to client requests, never a 
machine.
In my book, this is purely Windows jargon.  The term "client-server 
architecture"
that you quote emphasized that.

Perhaps "machine" would be the preferable term, because "host" is more prone to
misunderstandings (except in a networking context).

server/host: I agree that we are not interested in the question whether 
there is real hardware or any virtualization container. We are even not 
interested in the operating system. Our primary concern is the existence 
of a port of the Internet Protocol. But is the term "server" appropriate 
to name an IP-port? Additionally, "server" is used for other meanings: 
a) the previously mentioned "database server" b) a (virtual) machine: 
"server-side", "... the file ... loaded by the server ..." c) binaries 
"... the server must be built with SSL support ..." d) whenever it seems 
to be appropriate: "standby server", "... the server parses query ...", 
"server configuration", "server process".


Because of its ambiguous usage, the definition of "server" must clarify 
the allowed meanings. What's about:


--

server: Depending on the context, the term *server* denotes:

 * An IP-port which is offered by any OS.   ?
 * A - possibly virtualized - machine
 * An abbreviation for the slightly longer term "database(s)/cluster
   server"  ??? this will support the readability, but not the clarity ???
 * More ?

--

The term "host" is used mainly for IP configuration "host name", "host 
address" and in the context of compiling "host language", "host 
variable". These are clear situations and can be defined easily.





Re: Add A Glossary

2020-05-19 Thread Peter Eisentraut

On 2020-05-19 08:17, Laurenz Albe wrote:

The term "cluster" is unfortunate, because to most people it suggests a group of
machines, so the term "instance" is better, but that ship has sailed long ago.


I don't see what would stop us from renaming some things, with some care.

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Add A Glossary

2020-05-19 Thread Andrew Grillet
I think there needs to be a careful analysis of the language and a formal
effort to stabilise it for the future.

In the context of, say, an Oracle T series, which is partitioned into
multiple domains (virtual machines) in it, each
of these has multiple CPUs, and can run an instance of the OS which hosts
multiple virtual instances
of the same or different OSes. Som domains might do this while others do
not!

A host could be a domain, one of many virtual machines, or it could be one
of many hosts on that VM
but even these hosts could be virtual machines that each runs several
virtual servers!

Of course, PostgreSQL can run on any tier of this regime, but the
documentation at least needs to be consistent
about language.

A "machine" should probably refer to hardware, although I would accept that
a domain might count as "virtual
hardware" while a host should probably refer to a single instance of OS.

Of course it is possible for a single  instance of OS to run multiple
instances of PostgreSQL, and people do this. (I have
in the past).

Slightly more confusingly, it would appear possible for a single instance
of an OS to have multiple IP addresses
and if there are multiple instances of PostgreSQL, they may serve different
IP Addresses uniquely, or
share them. I think this case suggests that a host probably best describes
an OS instance. I might be wrong.

The word "server" might be an instance of any of the above, or a waiter
with a bowl of soup. It is best
reserved for situations where clarity is not required.

If you are new to all this, I am sure it is very confusing, and
inconsistent language is not going to help.

Andrew



AFAICT





On Tue, 19 May 2020 at 07:17, Laurenz Albe  wrote:

> On Mon, 2020-05-18 at 18:08 +0200, Jürgen Purtz wrote:
> > cluster/instance: PG (mainly) consists of a group of processes that
> commonly
> > act on shared buffers. The processes are very closely related to each
> other
> > and with the buffers. They exist altogether or not at all. They use a
> common
> > initialization file and are incarnated by one command. Everything exists
> > solely in RAM and therefor has a fluctuating nature. In summary: they
> build
> > a unit and this unit needs to have a name of itself. In some pages we
> used
> > to use the term *instance* - sometimes in extended forms: *database
> instance*,
> > *PG instance*, *standby instance*, *standby server instance*, *server
> instance*,
> > or *remote instance*.  For me, the term *instance* makes sense, the
> extensions
> > *standby instance* and *remote instance* in their context too.
>
> FWIW, I feel somewhat like Alvaro on that point; I use those terms
> synonymously,
> perhaps distinguishing between a "started cluster" and a "stopped cluster".
> After all, "cluster" refers to "a cluster of databases", which are there,
> regardless
> if you start the server or not.
>
> The term "cluster" is unfortunate, because to most people it suggests a
> group of
> machines, so the term "instance" is better, but that ship has sailed long
> ago.
>
> The static part of a cluster to me is the "data directory".
>
> > server/host: We need a term to describe the underlying hardware
> respectively
> > the virtual machine or container, where PG is running. I suggest to use
> both
> > *server* and *host*. In computer science, both have their eligibility
> and are
> > widely used. Everybody understands *client/server architecture* or
> *host* in
> > TCP/IP configuration. We cannot change such matter of course. I suggest
> to
> > use both depending on the context, but with the same meaning: "real
> hardware,
> > a container, or a virtual machine".
>
> On this I have a strong opinion because of my Unix mindset.
> "machine" and "host" are synonyms, and it doesn't matter to the database
> if they
> are virtualized or not.  You can always disambiguate by adding "virtual"
> or "physical".
>
> A "server" is a piece of software that responds to client requests, never
> a machine.
> In my book, this is purely Windows jargon.  The term "client-server
> architecture"
> that you quote emphasized that.
>
> Perhaps "machine" would be the preferable term, because "host" is more
> prone to
> misunderstandings (except in a networking context).
>
> Yours,
> Laurenz Albe
>
>
>
>


Re: Add A Glossary

2020-05-19 Thread Laurenz Albe
On Mon, 2020-05-18 at 18:08 +0200, Jürgen Purtz wrote:
> cluster/instance: PG (mainly) consists of a group of processes that commonly
> act on shared buffers. The processes are very closely related to each other
> and with the buffers. They exist altogether or not at all. They use a common
> initialization file and are incarnated by one command. Everything exists
> solely in RAM and therefor has a fluctuating nature. In summary: they build
> a unit and this unit needs to have a name of itself. In some pages we used
> to use the term *instance* - sometimes in extended forms: *database instance*,
> *PG instance*, *standby instance*, *standby server instance*, *server 
> instance*,
> or *remote instance*.  For me, the term *instance* makes sense, the extensions
> *standby instance* and *remote instance* in their context too.

FWIW, I feel somewhat like Alvaro on that point; I use those terms synonymously,
perhaps distinguishing between a "started cluster" and a "stopped cluster".
After all, "cluster" refers to "a cluster of databases", which are there, 
regardless
if you start the server or not.

The term "cluster" is unfortunate, because to most people it suggests a group of
machines, so the term "instance" is better, but that ship has sailed long ago.

The static part of a cluster to me is the "data directory".

> server/host: We need a term to describe the underlying hardware respectively
> the virtual machine or container, where PG is running. I suggest to use both
> *server* and *host*. In computer science, both have their eligibility and are
> widely used. Everybody understands *client/server architecture* or *host* in
> TCP/IP configuration. We cannot change such matter of course. I suggest to
> use both depending on the context, but with the same meaning: "real hardware,
> a container, or a virtual machine".

On this I have a strong opinion because of my Unix mindset.
"machine" and "host" are synonyms, and it doesn't matter to the database if they
are virtualized or not.  You can always disambiguate by adding "virtual" or 
"physical".

A "server" is a piece of software that responds to client requests, never a 
machine.
In my book, this is purely Windows jargon.  The term "client-server 
architecture"
that you quote emphasized that.

Perhaps "machine" would be the preferable term, because "host" is more prone to
misunderstandings (except in a networking context).

Yours,
Laurenz Albe





Re: Add A Glossary

2020-05-18 Thread Jürgen Purtz

On 17.05.20 17:28, Alvaro Herrera wrote:

On 2020-May-17, Erik Rijkers wrote:


On 2020-05-17 08:51, Alvaro Herrera wrote:

I don't think that's the general understanding of those terms.  For all
I know, they*are*  synonyms, and there's no specific term for "the
fluctuating objects" as you call them.  The instance is either running
(in which case there are processes and RAM) or it isn't.

For what it's worth, I've also always understood 'instance' as 'a running
database'.  I admit it might be a left-over from my oracle years:

https://docs.oracle.com/cd/E11882_01/server.112/e40540/startup.htm#CNCPT601

There, 'instance' clearly refers to a running database.  When that database
is stopped, it ceases to be an instance.

I've never understood it that way, but I'm open to having my opinion on
it changed.  So let's discuss it and maybe gather opinions from others.

I think the terms under discussion are just

* cluster
* instance
* server

We don't have "host" (I just made it a synonym for server), but perhaps
we can add that too, if it's useful.  It would be good to be consistent
with historical Postgres usage, such as the initdb usage of "cluster"
etc.

Perhaps we should not only define what our use of each term is, but also
explain how each term is used outside PostgreSQL and highlight the
differences.  (This would be particularly useful for "cluster" ISTM.)


In fact, we have reached a point where we don't have a common 
understanding of a group of terms. I'm sure that we will meet some more 
situations like this in the future. Such discussions, subsequent 
decisions, and implementations in the docs are necessary to gain a solid 
foundation - primarily for newcomers (what is my first motivation) as 
well as for more complex discussions among experts. Obviously, each of 
us will include his previous understanding of terms. But we also should 
be open to sometimes revise old terms.


Here are my two cents.

cluster/instance: PG (mainly) consists of a group of processes that 
commonly act on shared buffers. The processes are very closely related 
to each other and with the buffers. They exist altogether or not at all. 
They use a common initialization file and are incarnated by one command. 
Everything exists solely in RAM and therefor has a fluctuating nature. 
In summary: they build a unit and this unit needs to have a name of 
itself. In some pages we used to use the term *instance* - sometimes in 
extended forms: *database instance*, *PG instance*, *standby instance*, 
*standby server instance*, *server instance*, or *remote instance*.  For 
me, the term *instance* makes sense, the extensions *standby instance* 
and *remote instance* in their context too.


The next essential component is the data itself. It is organized as a 
group of databases plus some common management information (global, 
pg_wal, pg_xact, pg_tblspc, ...). The complete data must be treated as a 
whole because the management information concerns all databases. Its 
nature is different from the processes and shared buffers. Of course, 
its content changes, but it has a steady nature. It even survives a 
'power down'. There is one command to instantiate a new incarnation of 
the directory structure and all files. In summary, it's something of its 
own and should have its own name. 'database' is not possible because it 
consists of databases and other things. My favorite is *cluster*; 
*database cluster* is also possible.


server/host: We need a term to describe the underlying hardware 
respectively the virtual machine or container, where PG is running. I 
suggest to use both *server* and *host*. In computer science, both have 
their eligibility and are widely used. Everybody understands 
*client/server architecture* or *host* in TCP/IP configuration. We 
cannot change such matter of course. I suggest to use both depending on 
the context, but with the same meaning: "real hardware, a container, or 
a virtual machine".


--

Jürgen Purtz

(PS: I added the docs mailing list)