On 2020-07-17 11:32, Jürgen Purtz wrote:
On 12.07.20 22:45, Daniel Gustafsson wrote:
This patch no longer applies, due to conflicts in start.sgml, can you
please
submit a rebased version?
cheers ./daniel
New version attached.
[0005-architecture.patch]
Hi,
I went through the architecture.sgml file once, and accumulated the
attached edits.
There are still far too many Unneeded Capitals On Words for my taste but
I have not changed many of those. We could use some more opinions on
that, I suppose. (if it becomes too silent maybe include the
pgsql-hackers again?)
Thanks,
Erik Rijkers
--
Jürgen Purtz
--- doc/src/sgml/architecture.sgml.orig 2020-07-17 16:24:04.345941142 +0200
+++ doc/src/sgml/architecture.sgml 2020-07-18 19:04:30.694039877 +0200
@@ -4,36 +4,36 @@
<title>Architectural and implementational Cornerstones</title>
<para>
- Every DBMS implements basic strategies to achieve a fast and
+ Every DBMS implements basic strategies for a fast and
robust system. This chapter provides an overview of what
techniques <productname>PostgreSQL</productname> uses to
- reach this aim.
+ achieve this.
</para>
<sect1 id="tutorial-ram-proc-file">
<title>Collaboration of Processes, RAM, and Files</title>
<para>
- As is a matter of course, in a client/server architecture
+ In a client/server architecture
clients do not have direct access to the database. Instead,
- they merely send requests to the server-side and receive
- according information from there. In the case of
+ they send requests to the server and receive
+ the requested information. In the case of
<productname>PostgreSQL</productname>, at the server-side
there is one process per client, the so-called
<glossterm linkend="glossary-backend">Backend process</glossterm>.
It acts in close cooperation with the
<glossterm linkend="glossary-instance">Instance</glossterm> which
- is a group of tightly coupled other server-side processes plus a
+ is a group of server-side processes plus a
<glossterm linkend="glossary-shared-memory">Shared Memory</glossterm>
area.
</para>
<para>
- At start time, an instance is initiated by the
- <glossterm linkend="glossary-postmaster">Postmaster</glossterm>
+ At startup time, an instance is initiated by the
+ <glossterm linkend="glossary-postmaster">postmaster</glossterm>
process.
- It loads the configuration files, allocates the
+ The postmaster process loads the configuration files, allocates
<glossterm linkend="glossary-shared-memory">Shared Memory</glossterm>,
- and starts the comprehensive network of processes:
+ and starts a network of processes:
<glossterm linkend="glossary-background-writer">Background Writer</glossterm>,
<glossterm linkend="glossary-checkpointer">Checkpointer</glossterm>,
<glossterm linkend="glossary-wal-writer">WAL Writer</glossterm>,
@@ -65,8 +65,8 @@
Whenever a client application tries to connect to a
<glossterm linkend="glossary-database">database</glossterm>,
this request is handled in a first step by the <firstterm>
- Postgres process</firstterm>. It checks the authorization,
- starts a new <firstterm>Backend process</firstterm>,
+ postgres process</firstterm>. It checks authorization,
+ starts a new <firstterm>backend process</firstterm>,
and instructs the client application to connect to it. All
further client requests go to this process and are handled
by it.
@@ -83,20 +83,20 @@
<glossterm linkend="glossary-index">index</glossterm> files.
Because files are often larger than memory, it's likely that
the desired information is not (completely) available
- in the RAM. In this case the <firstterm>Backend process</firstterm>
+ in RAM. In this case the <firstterm>Backend process</firstterm>
must transfer additional file pages to
<firstterm>Shared Memory</firstterm>. Files are physically
organized in pages. Every transfer between files and
- RAM is performed in units of complete pages. Such transfers
- don't change the size or layout of pages.
+ RAM is performed in units of complete pages; such transfers
+ do not change the size or layout of pages.
</para>
<para>
- Reading file pages is notedly slower than reading
- RAM. This is the primary motivation for the existence of
+ Reading file pages is much slower than reading
+ RAM. This is the primary motivation for the usage of
<firstterm>Shared Memory</firstterm>. As soon as one
of the <firstterm>Backend processes</firstterm> has
- read pages into memory, those pages are available for all
+ read pages into memory, those pages become available for all
other <firstterm>Backend processes</firstterm> for direct
access in RAM.
</para>
@@ -121,13 +121,13 @@
First, whenever the content of a page changes, a
<glossterm linkend="glossary-wal-record">WAL record</glossterm>
is created out
- of the delta-information (difference between old and
- new content) and stored in another area of the
+ of the delta-information (difference between the old and
+ the new content) and stored in another area of
<firstterm>Shared Memory</firstterm>. These
<firstterm>WAL records</firstterm> are read by the
<firstterm>WAL Writer</firstterm> process,
which runs in parallel to the <firstterm>Backend
- processes</firstterm> and all other processes of
+ processes</firstterm> and other processes of
the <firstterm>Instance</firstterm>. It writes
the continuously arising <firstterm>WAL records</firstterm> to
the end of the current
@@ -137,7 +137,7 @@
to data files with <firstterm>heap</firstterm>
and <firstterm>index</firstterm> information.
As mentioned, this WAL-writing happens
- in an independent process. Nevertheless, all
+ in an independent process. All
<firstterm>WAL records</firstterm> created out of one
<firstterm>dirty page</firstterm> must be transferred
to disk before the <firstterm>dirty page</firstterm>
@@ -146,35 +146,34 @@
<para>
Second, the transfer of <firstterm>dirty buffers</firstterm>
- from <firstterm>Shared Memory</firstterm> to files must
- take place. This is the primary duty of the
+ from <firstterm>Shared Memory</firstterm> to file must
+ take place. This is the primary task of the
<firstterm>Background Writer</firstterm> process. Because
I/O activities can block other processes significantly,
it starts periodically and acts only for a short period.
- Doing so, his expensive I/O activities are spread over
- time, avoiding huge I/O peaks. Also, the <firstterm>
+ Doing so, its expensive I/O activities are spread over
+ time, avoiding debilitating I/O peaks. Also, the <firstterm>
Checkpointer</firstterm> process transfers
- <firstterm>dirty buffers</firstterm> to files —
+ <firstterm>dirty buffers</firstterm> to file —
see next paragraph.
</para>
<para>
- The <firstterm>Checkpointer</firstterm> has a special
- duty. As its name suggests, it has to create
- <firstterm>Checkpoints</firstterm>. Such a
+ The <firstterm>Checkpointer</firstterm> creates
+ <firstterm>Checkpoints</firstterm>. A
<glossterm linkend="glossary-checkpoint">Checkpoint</glossterm>
is a point in time when all older <firstterm>dirty buffers</firstterm>,
all older <firstterm>WAL records</firstterm>, and
finally a special <firstterm>Checkpoint record</firstterm>
have been written and flushed to disk.
- After a <firstterm>Checkpoint</firstterm>,
+ After a <firstterm>Checkpoint</firstterm>, we say
data files and <firstterm>WAL files</firstterm> are in sync.
In case of a recovery (after a crash of the instance)
- it is known that the information of all
+ it can be relied upon that the information of all
<firstterm>WAL records</firstterm> preceding
the last <firstterm>Checkpoint record</firstterm>
- is already integrated into the data files. This
- speeds up a possibly occurring recovery.
+ were already integrated into the data files. This
+ speeds up the recovery.
</para>
<para>
@@ -184,7 +183,7 @@
Those <firstterm>WAL files</firstterm> — in combination with
a previously taken <firstterm>Base Backup</firstterm> —
are necessary to restore a database after a crash of the
- disk, where data files have been stored. Therefore it is
+ disk on which data files have been stored. Therefore it is
recommended to transfer a copy of the <firstterm>
WAL files</firstterm>
to a second, independent place. The purpose of the
@@ -201,10 +200,10 @@
</para>
<para>
- The <glossterm linkend="glossary-logger">Logger</glossterm> writes
+ The <glossterm linkend="glossary-logger">Logger</glossterm> process writes
text lines about serious and less serious events which can happen
- during database access, e.g., wrong password, no permission,
- long-running queries, ... .
+ during database access, e.g. wrong password, no permission,
+ long-running queries, etc.
</para>
</sect1>
@@ -214,16 +213,17 @@
<para>
<!-- TODO: Link to cluster -->
- On a <glossterm linkend="glossary-server">Server</glossterm>
- exists one or more <glossterm linkend="glossary-instance">Cluster</glossterm>,
- each of them contains three or more
- <glossterm linkend="glossary-database">databases</glossterm>, each
- database contains many <glossterm linkend="glossary-schema">schemas</glossterm>,
- a schema contains <glossterm linkend="glossary-table">tables</glossterm>,
+ A <glossterm linkend="glossary-server">Server</glossterm>
+ contains one or more <glossterm linkend="glossary-instance">Clusters</glossterm>.
+ Each cluster contains three or more
+ <glossterm linkend="glossary-database">databases</glossterm>. Each
+ database can contain many <glossterm linkend="glossary-schema">schemas</glossterm>
+ (one schema, 'public', is provided by default).
+ A schema can contain <glossterm linkend="glossary-table">tables</glossterm>,
<glossterm linkend="glossary-view">views</glossterm>, and a lot of other objects.
Each <firstterm>table</firstterm> or <firstterm>view</firstterm>
- belongs to a single <firstterm>schema</firstterm>; they cannot
- belong to another <firstterm>schema</firstterm>. The same is
+ belongs to a single <firstterm>schema</firstterm> only; they cannot
+ belong to another <firstterm>schema</firstterm> as well. The same is
true for the schema/database and database/cluster relation.
<xref linkend="tutorial-cluster-db-schema-figure"/> visualizes
this hierarchy.
@@ -254,30 +254,30 @@
<para>
<literal>template0</literal> is the very first
<firstterm>database</firstterm> of any
- <firstterm>cluster</firstterm>. C-routines create
- <literal>template0</literal> during the initialization phase of
- the <firstterm>cluster</firstterm>.
- In a second step, <literal>template1</literal> is generated
- as a copy of <literal>template0</literal>, and finally
+ <firstterm>cluster</firstterm>.
+ Database <literal>template0</literal> is created during the
+ initialization phase of the <firstterm>cluster</firstterm>.
+ In a second step, database <literal>template1</literal> is generated
+ as a copy of <literal>template0</literal>, and finally database
<literal>postgres</literal> is generated as a copy of
- <literal>template1</literal>. All other
+ <literal>template1</literal>. Any
<glossterm linkend="app-createdb">new databases</glossterm>
- of this <firstterm>cluster</firstterm>,
- such as <literal>my_db</literal>, are also copied from
- <literal>template1</literal>. Due to the unique
- role of <literal>template0</literal> as the pristine origin
+ of the <firstterm>cluster</firstterm> that a user might need,
+ such as <literal>my_db</literal>, will be copied from the
+ <literal>template1</literal> database. Due to the unique
+ role of <literal>template0</literal> as the pristine original
of all other <firstterm>databases</firstterm>, no client
can connect to it.
</para>
<para>
- Every database contains <glossterm linkend="glossary-schema">
- schemas</glossterm>, and
+ Every database must contain <glossterm linkend="glossary-schema">
+ at least one schema</glossterm> because
<firstterm>schemas</firstterm> contain the other
<glossterm linkend="glossary-sql-object">SQL Objects</glossterm>.
<firstterm>Schemas</firstterm> are namespaces for
their <firstterm>SQL objects</firstterm> and ensure — with one
- exception — that within their scope names are used only once across all
+ exception — that within their scope, names are used only once across all
types of <firstterm>SQL objects</firstterm>. E.g., it is not possible
to have a table <literal>employee</literal> and a view
<literal>employee</literal> within the same
@@ -294,10 +294,10 @@
<para>
Some <firstterm>schemas</firstterm> are predefined.
<literal>public</literal> acts as the default
- <firstterm>schema</firstterm> and contains all such
- <firstterm>SQL objects</firstterm>, which are created
- within <literal>public</literal> or without using any schema
- name. <literal>public</literal> shall not contain user-defined
+ <firstterm>schema</firstterm> and contains all
+ <firstterm>SQL objects</firstterm> which are created
+ within <literal>public</literal> or without using an explicit schema
+ name. <literal>public</literal> should not contain user-defined
<firstterm>SQL objects</firstterm>. Instead, it is recommended to
create a separate <firstterm>schema</firstterm> that
holds individual objects like application-specific tables or
@@ -310,7 +310,7 @@
</para>
<para>
- There are a lot of different <firstterm>SQL object</firstterm>
+ There are many different <firstterm>SQL object</firstterm>
types: <firstterm>database, schema, table, view, materialized
view, index, constraint, sequence, function, procedure,
trigger, role, data type, operator, tablespace, extension,
@@ -353,7 +353,7 @@
<firstterm>Cluster</firstterm> has its root directory
somewhere in the file system. In many cases, the environment
variable <literal>PGDATA</literal> points to this directory.
- The example of the survey shown in
+ The example shown in
<xref linkend="tutorial-directories-figure"/> uses
<literal>data</literal> as the name of this root directory.
</para>
@@ -405,7 +405,7 @@
</para>
<para>
- Another prominent subdirectory is <literal>global</literal>.
+ Another subdirectory is <literal>global</literal>.
In analogy to the <firstterm>database</firstterm>-specific
subdirectories, there are files containing information about
<glossterm linkend="glossary-sql-object">Global SQL objects</glossterm>.
@@ -440,12 +440,12 @@
<para>
In the root directory <literal>data</literal>
there are also some files. In many cases, the configuration
- files of this <firstterm>cluster</firstterm> are stored
+ files of the <firstterm>cluster</firstterm> are stored
here. As long as the <firstterm>instance</firstterm>
is up and running, the file
<literal>postmaster.pid</literal> exists here
and contains the ID (pid) of the
- <firstterm>Postmaster</firstterm> process which
+ <firstterm>postmaster</firstterm> process which
has started the instance.
</para>
@@ -464,23 +464,23 @@
support many clients at the same time. Therefore, it is necessary to
protect concurrently running requests from unwanted overwriting
of other's data as well as from reading inconsistent data. Imagine an
- online shop offering the last copy of an article. Two clients show the
- article at their user interface. After a while, but at the same time,
+ online shop offering the last copy of an article. Two clients have the
+ article displayed at their user interface. After a while, but at the same time,
both users decide to put it to their shopping cart or even to buy it.
Both have seen the article, but only one can be allowed to get it.
The database must bring the two requests in a row, permit the access
- to one of them, block the other, and inform this one about the
- situation that the data was changed by a different process.
+ to one of them, block the other, and inform the blocked client
+ that the data was changed by a different process.
</para>
<para>
A first approach to implement protections against concurrent
accesses to the same data may be the locking of critical
- rows. There are two main categories of such techniques:
+ rows. Two such techniques are:
<emphasis>Optimistic Concurrency Control</emphasis> (OCC)
and <emphasis>Two Phase Locking</emphasis> (2PL).
- <productname>PostgreSQL</productname> implements the more
- sophisticated technique <firstterm>Multiversion Concurrency
+ <productname>PostgreSQL</productname> implements a third, more
+ sophisticated technique: <firstterm>Multiversion Concurrency
Control</firstterm> (MVCC). The crucial advantage of MVCC
over other technologies gets evident in multiuser OLTP
environments with a massive number of concurrent write
@@ -493,15 +493,15 @@
<para>
Instead of locking rows, the <firstterm>MVCC</firstterm> technique creates
- a new version of the same row when any data-change takes place. To
- distinguish between these versions as well as to track the timeline
+ a new version of the row when a data-change takes place. To
+ distinguish between these two versions and to track the timeline
of the row, each of the versions contains, in addition to their user-defined
columns, two special system columns, which are not visible
for the usual <command>SELECT * FROM ...</command> command.
The column <literal>xmin</literal> contains the transaction ID (xid)
- of the transaction, which creates this version of the row. Accordingly,
+ of the transaction, which created this version of the row. Accordingly,
<literal>xmax</literal> contains the xid of the transaction, which has
- deleted this version, respectively a zero, if the version is not
+ deleted this version, or zero, if the version is not
deleted. You can read both with the command
<command>SELECT xmin, xmax, * FROM ... </command>.
</para>
@@ -519,9 +519,9 @@
</para>
<para>
- Please note that the description in this chapter simplifies the situation
- by omitting details. When many transactions are running simultaneously,
- things can get very complicated. Sometimes they get aborted via
+ The description in this chapter simplifies by omitting detail.
+ When many transactions are running simultaneously,
+ things can get complicated. Sometimes transactions get aborted via
ROLLBACK immediately or after a lot of other activities, sometimes
a single row is involved in more than one transaction, sometimes
a client crashes, sometimes the sequence of xids restarts
@@ -567,7 +567,7 @@
changing the user data from <literal>'x'</literal> to
<literal>'y'</literal>. According to the MVCC principles,
the data in the old version of the row does not change!
- The value <literal>'x'</literal> keeps as it was before.
+ The value <literal>'x'</literal> remains as it was before.
Only <literal>xmax</literal> changes to <literal>135</literal>.
Now, this version is treated as valid exclusively for
transactions with xids from <literal>123</literal> to
@@ -591,7 +591,7 @@
<para>
Finally, a row may be deleted by a <command>DELETE</command>
- command. Even in this case, all versions of the row keep as
+ command. Even in this case, all versions of the row remain as
before. Nothing is thrown away so far! Only <literal>xmax</literal>
of the last version changes to the xid of the <command>DELETE</command>
transaction, which indicates that it is only valid for
@@ -603,14 +603,14 @@
<para>
In summary, the MVCC technology creates more and more versions
of the same row in the table's heap file and leaves them there,
- even with a <command>DELETE</command> command. The youngest
+ even after a <command>DELETE</command> command. Only the youngest
version is relevant for all future transactions. But the
system must also preserve some of the older ones for a
- certain amount of time because the possiblility exists that
- they are or could become relevant for any of the pending
+ certain amount of time because the possibility exists that
+ they are or could become relevant for any pending
transactions. Over time, also the older ones get out of scope
for ALL transactions and therefore become unnecessary.
- Nevertheless, they exist physically on the disk and occupy
+ Nevertheless, they do exist physically on the disk and occupy
space.
</para>
@@ -629,7 +629,7 @@
xids grow, old row versions get out of scope over time.
If an old row version is no longer valid for ALL existing
transactions, it's called <firstterm>dead</firstterm>. The
- space occupied by the sum of all dead row versions is
+ space occupied by all dead row versions is
called <firstterm>bloat</firstterm>.
</simpara>
</listitem>
@@ -637,7 +637,7 @@
<listitem>
<simpara>
Internally, an <command>UPDATE</command> command acts in the
- same way as a <command>DELETE</command> command, followed by
+ same way as a <command>DELETE</command> command followed by
an <command>INSERT</command> command.
</simpara>
</listitem>
@@ -646,7 +646,7 @@
<simpara>
Nothing gets wiped away — with the consequence that the database
occupies more and more disk space. It is obvious that
- this behavior has to be automatically corrected in some
+ this behavior has to be corrected in some
way. The next chapter explains how AUTOVACUUM fulfills
this task.
</simpara>
@@ -664,7 +664,7 @@
more and more disk space, the <firstterm>bloat</firstterm>.
This chapter explains how the SQL command
<firstterm>VACUUM</firstterm> and the automatically running
- <firstterm>AUTOVACUUM</firstterm> processes clear the situation
+ <firstterm>AUTOVACUUM</firstterm> processes clean up
by eliminating <firstterm>bloat</firstterm>.
</para>
@@ -672,9 +672,9 @@
<para>
<firstterm>AUTOVACUUM</firstterm> runs automatically by
default. Its default parameters as well as such for
- <firstterm>VACUUM</firstterm> fits well for most standard
+ <firstterm>VACUUM</firstterm> fit well for most standard
situations. Therefore a novice database manager can
- easily skip the rest of this chapter, which explains
+ easily skip the rest of this chapter which explains
a lot of details.
</para>
</note>
@@ -682,16 +682,16 @@
<para>
Client processes can issue the SQL command VACUUM at arbitrary
points in time. DBAs do this when they recognize special situations,
- or they start it in batch jobs, which run periodically.
+ or they start it in batch jobs which run periodically.
AUTOVACUUM processes run as part of the
<link linkend="glossary-instance">Instance</link> at the server.
- There is a constantly running AUTOVACUUM daemon. He permanently
+ There is a constantly running AUTOVACUUM daemon. It permanently
controls the state of all databases based on values that are
collected by the <link linkend="glossary-stats-collector">
Statistics Collector</link> and starts
- AUTOVACUUM processes whenever he detects certain situations.
+ AUTOVACUUM processes whenever it detects certain situations.
Thus, it's a dynamic behavior of <productname>PostgreSQL</productname>
- with the intention to tidy up — not always, but whenever it
+ with the intention to tidy up — whenever it
is appropriate.
</para>
@@ -712,7 +712,7 @@
<firstterm>Freeze</firstterm>: Mark the youngest row version
as frozen. This means that the version
is always treated as valid (visible) independent from
- the <firstterm>wraparound problematic</firstterm> (see below).
+ the <firstterm>wraparound problem</firstterm> (see below).
</simpara>
</listitem>
@@ -729,22 +729,22 @@
<simpara>
<emphasis>Statistics</emphasis>: Collect statistics about the
number of rows per table, the distribution of values, and so on,
- as the basis for query planner's decision making.
+ as the basis for decisions of the query planner.
</simpara>
</listitem>
</itemizedlist>
<para>
- The eagerness — you can call it 'aggressivity' — of the
+ The eagerness — you can call it 'aggression' — of the
operations <emphasis>eliminating bloat</emphasis> and
<emphasis>freeze</emphasis> is controlled by configuration
parameters, runtime flags, and in extreme situations by
- themselves. Because vacuum operations typically are I/O
+ the processes themselves. Because vacuum operations typically are I/O
intensive, which can hinder other activities, AUTOVACUUM
avoids performing many vacuum operations in bulk. Instead,
it carries out many small actions with time gaps in between.
- The SQL command VACUUM runs immediately without any
+ The SQL command VACUUM runs immediately and without any
time gaps.
</para>
@@ -794,8 +794,8 @@
After the vacuum operation detects a superfluous row version, it
marks its space as free for future use of writing
actions. Only in rare situations (or in the case of VACUUM FULL),
- this space is released to the operating system. In most cases,
- it keeps occupied by PostgreSQL and will be used by future
+ is this space released to the operating system. In most cases,
+ it remains occupied by PostgreSQL and will be used by future
<command>INSERT</command> or <command>UPDATE</command>
commands concerning this row or a completely different one.
</para>
@@ -827,7 +827,7 @@
<listitem>
<simpara>
When a client issues the SQL command VACUUM with the option FULL.
- Also, in this mode, the bloat disappears, but the used strategy
+ Also, in this mode, the bloat disappears, but the strategy used
is very different: In this case, the complete table is copied
to a different file skipping all outdated row versions. This
leads to a significant reduction of used disk space because
@@ -839,8 +839,8 @@
<listitem>
<simpara>
When an AUTOVACUUM process acts. For optimization
- purposes, he considers the Visibility Map in the same way as
- VACUUM. Additionally, he ignores tables with few modifications;
+ purposes, it considers the Visibility Map in the same way as
+ VACUUM. Additionally, it ignores tables with few modifications;
see <xref linkend="guc-autovacuum-vacuum-threshold"/>,
which defaults to 50 rows and
<xref linkend="guc-autovacuum-vacuum-scale-factor"/>,
@@ -864,7 +864,7 @@
a certain number of new transactions they are forced to restart
from the beginning, which is called <firstterm>wraparound</firstterm>.
Therefore the terms 'old transaction' / 'young transaction' does
- not always correlate with low / hight values of xids. Near to the
+ not always correlate with low / high values of xids. Near to the
wraparound point, there are cases where xmin has a higher value
than xmax, although their meaning is said to be older than xmax.
</para>
@@ -891,10 +891,10 @@
The use of a limited range of IDs for transactions leads
to the necessity to restart the sequence sooner or later.
This does not only have the rare consequence previously
- described that sometimes xmin is huger than xmax. The far
+ described that sometimes xmin is higher than xmax. The far
more critical problem is that whenever the system has
to evaluate a WHERE condition, it must decide which row
- versions are valid (visible) from the perspective of the
+ version is valid (visible) from the perspective of the
transaction of this query. If a wraparound couldn't happen,
this decision would be relatively easy: the xid
must be between xmin and xmax, and the corresponding
@@ -911,7 +911,7 @@
<listitem>
<simpara>
- In a first step, PostgreSQL divides the complete range of
+ As a first step, PostgreSQL divides the complete range of
possible xids into two halves with the two split-points
'txid_current' and 'txid_current + 2^31'. The half behind
'txid_current' is considered to represent xids of the
@@ -975,13 +975,13 @@
</para>
<para>
- At what point in time the freeze operation will take place?
+ At what point in time does the freeze operation take place?
<itemizedlist>
<listitem>
<simpara>
When a client issues the SQL command VACUUM with its
- FREEZE option. In this case, all such pages are
+ FREEZE option. In this case, all pages are
processed that are marked in the Visibility Map
to potentially have unfrozen rows.
</simpara>
@@ -989,12 +989,12 @@
<listitem>
<simpara>
When a client issues the SQL command VACUUM without any
- option but finds that there are xids older than
+ options but finds that there are xids older than
<xref linkend="guc-vacuum-freeze-table-age"/>
(default: 150 million) minus
<xref linkend="guc-vacuum-freeze-min-age"/>
(default: 50 million).
- As before, all such pages are processed that are
+ As before, all pages are processed that are
marked in the Visibility Map to potentially have unfrozen
rows.
</simpara>
@@ -1008,11 +1008,11 @@
<itemizedlist>
<listitem>
<simpara>
- In the <emphasis>normal mode</emphasis>, he skips
+ In the <emphasis>normal mode</emphasis>, it skips
pages with row versions that are younger than
<xref linkend="guc-vacuum-freeze-min-age"/>
(default: 50 million) and works only on pages where
- all xids are older. The skipping of jung xids prevents
+ all xids are older. The skipping of young xids prevents
work on such pages, which are likely to be changed
by one of the future SQL commands.
</simpara>
@@ -1020,7 +1020,7 @@
<listitem>
<simpara>
The process switches
- to an <emphasis>aggressive mode</emphasis> if he recognizes
+ to an <emphasis>aggressive mode</emphasis> if it recognizes
that for the processed table their oldest xid exceeds
<xref linkend="guc-autovacuum-freeze-max-age"/>
(default: 200 million). The value of the oldest unfrozen
@@ -1038,12 +1038,12 @@
<para>
In the first two cases and with autovacuum in
- <emphasis>aggressive mode</emphasis>, the system knowns
+ <emphasis>aggressive mode</emphasis>, the system knows
to which value the oldest unfrozen xid has moved forward and
logs the value in <emphasis>pg_class.relfrozenxid</emphasis>.
The distance between this value and the 'txid_current' split
point becomes smaller, and the distance to 'txid_current + 2^31'
- larger than before.
+ becomes larger than before.
</para>
<figure id="tutorial-freeze-figure">
@@ -1069,7 +1069,7 @@
running <firstterm>autovacuum daemon</firstterm>. If the
daemon detects that for a table <firstterm>
autovacuum_freeze_max_age</firstterm> is exceeded, it starts
- an AUTOVACUUM process in the <emphasis>aggressive mode</emphasis>
+ an AUTOVACUUM process in <emphasis>aggressive mode</emphasis>
(see above) — even if AUTOVACUUM is disabled.
</para>
@@ -1079,8 +1079,8 @@
The <link linkend="glossary-vm">Visibility Map</link>
(VM) contains two flags — stored as
two bits — for each page of the heap. If the first bit
- is set, it indicates that the associated page does not
- contain any bloat. If the second one is set, it indicates
+ is set, that indicates that the associated page does not
+ contain any bloat. If the second one is set, that indicates
that the page contains only frozen rows.
</para>
@@ -1099,7 +1099,7 @@
<para>
The setting of the flags is silently done by VACUUM
and AUTOVACUUM during their bloat and freeze operations.
- This is done to accelerate future vacuum actions,
+ This is done to speed up future vacuum actions,
regular accesses to heap pages, and some accesses to
the index. Every data-modifying operation on any row
version of the page clears the flags.
@@ -1122,7 +1122,7 @@
linkend="planner-stats">Query Planner</link> to make optimal
decisions for the generation of execution plans. This
information can be gathered with the SQL commands ANALYZE
- or VACUUM ANALYZE. But also autovacuum processes gather
+ or VACUUM ANALYZE. But autovacuum processes also gather
such information. Depending on the percentage of changed rows
per table <xref linkend="guc-autovacuum-analyze-scale-factor"/>,
the autovacuum daemon starts autovacuum processes to collect
@@ -1144,7 +1144,7 @@
<link linkend="tutorial-transactions">Transactions</link>
are a fundamental concept of relational database systems.
Their essential point is that they bundle multiple
- read- or write-operations into a single, all-or-nothing
+ read- or write-operations into a single all-or-nothing
operation. Furthermore, they separate and protect concurrent
actions of different connections from each other. Thereby
they implement the ACID paradigm.
@@ -1152,7 +1152,7 @@
<para>
In <productname>PostgreSQL</productname> there are two ways
- to establish a transaction. The explicite way uses the keywords
+ to establish a transaction. The explicit way uses the keywords
<link linkend="sql-begin">BEGIN</link> and
<link linkend="sql-commit">COMMIT</link> (respectively
<link linkend="sql-rollback">ROLLBACK</link>) before
@@ -1188,9 +1188,9 @@
</para>
<para>
- The atomicity also affects the visibility of changes. All
+ The atomicity also affects the visibility of changes. No
connections running simultaneously to a data modifying
- transaction will never see any change before the
+ transaction will ever see any change before the
transaction successfully executes a <command>COMMIT</command>
— even in the lowest
<link linkend="transaction-iso">isolation level</link>
@@ -1221,7 +1221,7 @@
</para>
<para>
- <productname>PostgreSQL</productname> overcomes the
+ <productname>PostgreSQL</productname> overcomes this
problem by showing only such row versions to other
transactions whose originating transaction is
successfully committed. It skips all row versions of
@@ -1229,9 +1229,9 @@
<productname>PostgreSQL</productname> solves one more
problem. Even the single <command>COMMIT</command>
command needs a short time interval for its execution.
- Therefor its critical 'dead-or-survival' phase
+ Therefore its critical 'dead-or-survival' phase
runs in a priviledged mode where it cannot be
- interupted by other processes.
+ interrupted by other processes.
</para>
<bridgehead renderas="sect2">What are the benefits?</bridgehead>
@@ -1247,10 +1247,10 @@
the transfers of some money from one account to another.
It is obvious
that the decrease of the one and the increase of the
- other are impartible. Nevertheless, there is no particular
+ other must be indivisible. Nevertheless, there is no particular
need for an application to do something to ensure the
<glossterm linkend="glossary-atomicity">atomicity</glossterm>
- of its behavior. It's enough to surround them with
+ of this behavior. It's enough to surround them with
<command>BEGIN</command> and <command>COMMIT</command>.
</para>
@@ -1260,10 +1260,10 @@
conditions. In such cases, the application simply issues a
<command>ROLLBACK</command> command instead of a
<command>COMMIT</command>. The <command>ROLLBACK</command>
- cancels the transaction, and all changes made so far retain
+ cancels the transaction, and all changes made so far remain
invisible forever; it's like they never happened. There
is no need for the application to log its activities and
- undo every single step.
+ undo every step of the transaction separately.
</para>
<para>
@@ -1282,7 +1282,7 @@
<para>
Also, all self-evident — but possibly not obvious
— low-level demands on the database system are
- ensured; e.g., index entries for rows must become
+ ensured; e.g. index entries for rows must become
visible at the same moment as the rows themselves.
</para>
@@ -1307,7 +1307,7 @@
<para>
Nothing is perfect and failures inevitably happen.
- However, the most common types of failures are
+ However, the most common types of failure are
well known and <productname>PostgreSQL</productname>
implements strategies to overcome them.
Such strategies use parts of the previously presented
@@ -1356,7 +1356,7 @@
actions. The WAL records are written first. Second,
the data itself shall exist in the heap and index files.
In opposite to the WAL records, this part may or may
- not be transferred entirely from shared buffers to the files.
+ not have been transferred entirely from shared buffers (=RAM) to the files.
</para>
<para>
The automatic recovery searches within the WAL files for
@@ -1378,7 +1378,7 @@
<bridgehead renderas="sect3">Disk crash</bridgehead>
<para>
- If a disk crashes, the course of actions described previously
+ If a disk crashes, the course of action described previously
cannot work. It is likely that the WAL files and/or the
data and index files are no longer available. You need
to take special actions to overcome such situations.
@@ -1453,14 +1453,14 @@
<link linkend="backup-file">copy</link>
of the cluster's directory structure and files. In
case of severe problems such a copy can serve as
- the source of a recovery. But in order to get a
+ the source of recovery. But in order to get a
<emphasis>USABLE</emphasis> backup by this method,
the database server <emphasis>MUST</emphasis> be
shut down during the complete runtime of the copy
command!
</para>
<para>
- The apparent disadvantage of this method is that there
+ The obvious disadvantage of this method is that there
is a downtime where no user interaction is possible.
</para>
@@ -1515,7 +1515,7 @@
If configured, the
<glossterm linkend="glossary-wal-archiver">Archiver process</glossterm>
will automatically copy every single WAL file to a save location.
- <link linkend="backup-archiving-wal">It's configuration</link>
+ <link linkend="backup-archiving-wal">Its configuration</link>
consists mainly of a string, which contains a copy command
in the operating system's syntax. In order to protect your
data against a disk crash, the destination location