Re: Additional Chapter for Tutorial

Erik Rijkers Sat, 18 Jul 2020 10:19:03 -0700

On 2020-07-17 11:32, Jürgen Purtz wrote:

On 12.07.20 22:45, Daniel Gustafsson wrote:
This patch no longer applies, due to conflicts in start.sgml, can youplease
submit a rebased version?

cheers ./daniel
New version attached.

[0005-architecture.patch]

Hi,

I went through the architecture.sgml file once, and accumulated theattached edits.

There are still far too many Unneeded Capitals On Words for my taste butI have not changed many of those. We could use some more opinions onthat, I suppose. (if it becomes too silent maybe include thepgsql-hackers again?)


Thanks,


Erik Rijkers

--

Jürgen Purtz

--- doc/src/sgml/architecture.sgml.orig	2020-07-17 16:24:04.345941142 +0200
+++ doc/src/sgml/architecture.sgml	2020-07-18 19:04:30.694039877 +0200
@@ -4,36 +4,36 @@
   <title>Architectural and implementational Cornerstones</title>
 
   <para>
-   Every DBMS implements basic strategies to achieve a fast and
+   Every DBMS implements basic strategies for a fast and
    robust system. This chapter provides an overview of what
    techniques <productname>PostgreSQL</productname> uses to
-   reach this aim.
+   achieve this.
   </para>
 
   <sect1 id="tutorial-ram-proc-file">
    <title>Collaboration of Processes, RAM, and Files</title>
    <para>
-    As is a matter of course, in a client/server architecture
+    In a client/server architecture
     clients do not have direct access to the database. Instead,
-    they merely send requests to the server-side and receive
-    according information from there. In the case of
+    they send requests to the server and receive
+    the requested information. In the case of
     <productname>PostgreSQL</productname>, at the server-side
     there is one process per client, the so-called
     <glossterm linkend="glossary-backend">Backend process</glossterm>.
     It acts in close cooperation with the
     <glossterm linkend="glossary-instance">Instance</glossterm> which
-    is a group of tightly coupled other server-side processes plus a
+    is a group of server-side processes plus a
     <glossterm linkend="glossary-shared-memory">Shared Memory</glossterm>
     area.
    </para>
 
    <para>
-    At start time, an instance is initiated by the
-    <glossterm linkend="glossary-postmaster">Postmaster</glossterm>
+    At startup time, an instance is initiated by the
+    <glossterm linkend="glossary-postmaster">postmaster</glossterm>
     process.
-    It loads the configuration files, allocates the
+    The postmaster process loads the configuration files, allocates
     <glossterm linkend="glossary-shared-memory">Shared Memory</glossterm>,
-    and starts the comprehensive network of processes:
+    and starts a network of processes:
     <glossterm linkend="glossary-background-writer">Background Writer</glossterm>,
     <glossterm linkend="glossary-checkpointer">Checkpointer</glossterm>,
     <glossterm linkend="glossary-wal-writer">WAL Writer</glossterm>,
@@ -65,8 +65,8 @@
     Whenever a client application tries to connect to a
     <glossterm linkend="glossary-database">database</glossterm>,
     this request is handled in a first step by the <firstterm>
-    Postgres process</firstterm>. It checks the authorization,
-    starts a new <firstterm>Backend process</firstterm>,
+    postgres process</firstterm>. It checks authorization,
+    starts a new <firstterm>backend process</firstterm>,
     and instructs the client application to connect to it. All
     further client requests go to this process and are handled
     by it.
@@ -83,20 +83,20 @@
     <glossterm linkend="glossary-index">index</glossterm> files.
     Because files are often larger than memory, it's likely that
     the desired information is not (completely) available
-    in the RAM. In this case the <firstterm>Backend process</firstterm>
+    in RAM. In this case the <firstterm>Backend process</firstterm>
     must transfer additional file pages to
     <firstterm>Shared Memory</firstterm>. Files are physically
     organized in pages. Every transfer between files and
-    RAM is performed in units of complete pages. Such transfers
-    don't change the size or layout of pages.
+    RAM is performed in units of complete pages; such transfers
+    do not change the size or layout of pages.
    </para>
 
    <para>
-    Reading file pages is notedly slower than reading
-    RAM. This is the primary motivation for the existence of
+    Reading file pages is much slower than reading
+    RAM. This is the primary motivation for the usage of
     <firstterm>Shared Memory</firstterm>. As soon as one
     of the <firstterm>Backend processes</firstterm> has
-    read pages into memory, those pages are available for all
+    read pages into memory, those pages become available for all
     other <firstterm>Backend processes</firstterm> for direct
     access in RAM.
    </para>
@@ -121,13 +121,13 @@
     First, whenever the content of a page changes, a
     <glossterm linkend="glossary-wal-record">WAL record</glossterm>
     is created out
-    of the delta-information (difference between old and
-    new content) and stored in another area of the
+    of the delta-information (difference between the old and
+    the new content) and stored in another area of
     <firstterm>Shared Memory</firstterm>. These
     <firstterm>WAL records</firstterm> are read by the
     <firstterm>WAL Writer</firstterm> process,
     which runs in parallel to the <firstterm>Backend
-    processes</firstterm> and all other processes of
+    processes</firstterm> and other processes of
     the <firstterm>Instance</firstterm>. It writes
     the continuously arising <firstterm>WAL records</firstterm> to
     the end of the current
@@ -137,7 +137,7 @@
     to data files with <firstterm>heap</firstterm>
     and <firstterm>index</firstterm> information.
     As mentioned, this WAL-writing happens
-    in an independent process. Nevertheless, all
+    in an independent process. All
     <firstterm>WAL records</firstterm> created out of one
     <firstterm>dirty page</firstterm> must be transferred
     to disk before the <firstterm>dirty page</firstterm>
@@ -146,35 +146,34 @@
 
    <para>
     Second, the transfer of <firstterm>dirty buffers</firstterm>
-    from <firstterm>Shared Memory</firstterm> to files must
-    take place. This is the primary duty of the
+    from <firstterm>Shared Memory</firstterm> to file must
+    take place. This is the primary task of the
     <firstterm>Background Writer</firstterm> process. Because
     I/O activities can block other processes significantly,
     it starts periodically and acts only for a short period.
-    Doing so, his expensive I/O activities are spread over
-    time, avoiding huge I/O peaks. Also, the <firstterm>
+    Doing so, its expensive I/O activities are spread over
+    time, avoiding debilitating I/O peaks. Also, the <firstterm>
     Checkpointer</firstterm> process transfers
-    <firstterm>dirty buffers</firstterm> to files &mdash;
+    <firstterm>dirty buffers</firstterm> to file &mdash;
     see next paragraph.
    </para>
 
    <para>
-    The <firstterm>Checkpointer</firstterm> has a special
-    duty. As its name suggests, it has to create
-    <firstterm>Checkpoints</firstterm>. Such a
+    The <firstterm>Checkpointer</firstterm> creates
+    <firstterm>Checkpoints</firstterm>.  A
     <glossterm linkend="glossary-checkpoint">Checkpoint</glossterm>
     is a point in time when all older <firstterm>dirty buffers</firstterm>,
     all older <firstterm>WAL records</firstterm>, and
     finally a special <firstterm>Checkpoint record</firstterm>
     have been written and flushed to disk.
-    After a <firstterm>Checkpoint</firstterm>,
+    After a <firstterm>Checkpoint</firstterm>, we say
     data files and <firstterm>WAL files</firstterm> are in sync.
     In case of a recovery (after a crash of the instance)
-    it is known that the information of all
+    it can be relied upon that the information of all
     <firstterm>WAL records</firstterm> preceding
     the last <firstterm>Checkpoint record</firstterm>
-    is already integrated into the data files. This
-    speeds up a possibly occurring recovery.
+    were already integrated into the data files. This
+    speeds up the recovery.
    </para>
 
    <para>
@@ -184,7 +183,7 @@
     Those <firstterm>WAL files</firstterm> &mdash; in combination with
     a previously taken <firstterm>Base Backup</firstterm> &mdash;
     are necessary to restore a database after a crash of the
-    disk, where data files have been stored. Therefore it is
+    disk on which data files have been stored. Therefore it is
     recommended to transfer a copy of the <firstterm>
     WAL files</firstterm>
     to a second, independent place. The purpose of the
@@ -201,10 +200,10 @@
    </para>
 
    <para>
-    The <glossterm linkend="glossary-logger">Logger</glossterm> writes
+    The <glossterm linkend="glossary-logger">Logger</glossterm> process writes
     text lines about serious and less serious events which can happen
-    during database access, e.g., wrong password, no permission,
-    long-running queries, ... .
+    during database access, e.g. wrong password, no permission,
+    long-running queries, etc.
    </para>
 
   </sect1>
@@ -214,16 +213,17 @@
 
    <para>
 <!-- TODO: Link to cluster -->
-    On a <glossterm linkend="glossary-server">Server</glossterm>
-    exists one or more <glossterm linkend="glossary-instance">Cluster</glossterm>,
-    each of them contains three or more
-    <glossterm linkend="glossary-database">databases</glossterm>, each
-    database contains many <glossterm linkend="glossary-schema">schemas</glossterm>,
-    a schema contains <glossterm linkend="glossary-table">tables</glossterm>,
+    A <glossterm linkend="glossary-server">Server</glossterm>
+    contains one or more <glossterm linkend="glossary-instance">Clusters</glossterm>.
+    Each cluster contains three or more
+    <glossterm linkend="glossary-database">databases</glossterm>.  Each
+    database can contain many <glossterm linkend="glossary-schema">schemas</glossterm>
+    (one schema, 'public', is provided by default).
+    A schema can contain <glossterm linkend="glossary-table">tables</glossterm>,
     <glossterm linkend="glossary-view">views</glossterm>, and a lot of other objects.
     Each <firstterm>table</firstterm> or <firstterm>view</firstterm>
-    belongs to a single <firstterm>schema</firstterm>; they cannot
-    belong to another <firstterm>schema</firstterm>. The same is
+    belongs to a single <firstterm>schema</firstterm> only; they cannot
+    belong to another <firstterm>schema</firstterm> as well. The same is
     true for the schema/database and database/cluster relation.
     <xref linkend="tutorial-cluster-db-schema-figure"/> visualizes
     this hierarchy.
@@ -254,30 +254,30 @@
    <para>
     <literal>template0</literal> is the very first
     <firstterm>database</firstterm> of any
-    <firstterm>cluster</firstterm>. C-routines create
-    <literal>template0</literal> during the initialization phase of
-    the <firstterm>cluster</firstterm>.
-    In a second step, <literal>template1</literal> is generated
-    as a copy of <literal>template0</literal>, and finally
+    <firstterm>cluster</firstterm>.  
+    Database <literal>template0</literal> is created during the
+    initialization phase of the <firstterm>cluster</firstterm>.
+    In a second step, database <literal>template1</literal> is generated
+    as a copy of <literal>template0</literal>, and finally database
     <literal>postgres</literal> is generated as a copy of
-    <literal>template1</literal>. All other
+    <literal>template1</literal>. Any 
     <glossterm linkend="app-createdb">new databases</glossterm>
-    of this <firstterm>cluster</firstterm>,
-    such as <literal>my_db</literal>, are also copied from
-    <literal>template1</literal>. Due to the unique
-    role of <literal>template0</literal> as the pristine origin
+    of the <firstterm>cluster</firstterm> that a user might need,
+    such as <literal>my_db</literal>, will be copied from the
+    <literal>template1</literal> database. Due to the unique
+    role of <literal>template0</literal> as the pristine original
     of all other <firstterm>databases</firstterm>, no client
     can connect to it.
    </para>
 
    <para>
-    Every database contains <glossterm linkend="glossary-schema">
-    schemas</glossterm>, and
+    Every database must contain <glossterm linkend="glossary-schema">
+    at least one schema</glossterm> because
     <firstterm>schemas</firstterm> contain the other
     <glossterm linkend="glossary-sql-object">SQL Objects</glossterm>.
     <firstterm>Schemas</firstterm> are namespaces for
     their <firstterm>SQL objects</firstterm> and ensure &mdash; with one
-    exception &mdash; that within their scope names are used only once across all
+    exception &mdash; that within their scope, names are used only once across all
     types of <firstterm>SQL objects</firstterm>. E.g., it is not possible
     to have a table <literal>employee</literal> and a view
     <literal>employee</literal> within the same
@@ -294,10 +294,10 @@
    <para>
     Some <firstterm>schemas</firstterm> are predefined.
     <literal>public</literal> acts as the default
-    <firstterm>schema</firstterm> and contains all such
-    <firstterm>SQL objects</firstterm>, which are created
-    within <literal>public</literal> or without using any schema
-    name. <literal>public</literal> shall not contain user-defined
+    <firstterm>schema</firstterm> and contains all
+    <firstterm>SQL objects</firstterm> which are created
+    within <literal>public</literal> or without using an explicit schema
+    name. <literal>public</literal> should not contain user-defined
     <firstterm>SQL objects</firstterm>. Instead, it is recommended to
     create a separate <firstterm>schema</firstterm> that
     holds individual objects like application-specific tables or
@@ -310,7 +310,7 @@
    </para>
 
    <para>
-    There are a lot of different <firstterm>SQL object</firstterm>
+    There are many different <firstterm>SQL object</firstterm>
     types: <firstterm>database, schema, table, view, materialized
     view, index, constraint, sequence, function, procedure,
     trigger, role, data type, operator, tablespace, extension,
@@ -353,7 +353,7 @@
     <firstterm>Cluster</firstterm> has its root directory
     somewhere in the file system. In many cases, the environment
     variable <literal>PGDATA</literal> points to this directory.
-    The example of the survey shown in
+    The example shown in
     <xref linkend="tutorial-directories-figure"/> uses
     <literal>data</literal> as the name of this root directory.
    </para>
@@ -405,7 +405,7 @@
    </para>
 
    <para>
-    Another prominent subdirectory is <literal>global</literal>.
+    Another subdirectory is <literal>global</literal>.
     In analogy to the <firstterm>database</firstterm>-specific
     subdirectories, there are files containing information about
     <glossterm linkend="glossary-sql-object">Global SQL objects</glossterm>.
@@ -440,12 +440,12 @@
    <para>
     In the root directory <literal>data</literal>
     there are also some files. In many cases, the configuration
-    files of this <firstterm>cluster</firstterm> are stored
+    files of the <firstterm>cluster</firstterm> are stored
     here. As long as the <firstterm>instance</firstterm>
     is up and running, the file
     <literal>postmaster.pid</literal> exists here
     and contains the ID (pid) of the
-    <firstterm>Postmaster</firstterm> process which
+    <firstterm>postmaster</firstterm> process which
     has started the instance.
    </para>
 
@@ -464,23 +464,23 @@
     support many clients at the same time. Therefore, it is necessary to
     protect concurrently running requests from unwanted overwriting
     of other's data as well as from reading inconsistent data. Imagine an
-    online shop offering the last copy of an article. Two clients show the
-    article at their user interface. After a while, but at the same time,
+    online shop offering the last copy of an article. Two clients have the
+    article displayed at their user interface. After a while, but at the same time,
     both users decide to put it to their shopping cart or even to buy it.
     Both have seen the article, but only one can be allowed to get it.
     The database must bring the two requests in a row, permit the access
-    to one of them, block the other, and inform this one about the
-    situation that the data was changed by a different process.
+    to one of them, block the other, and inform the blocked client 
+    that the data was changed by a different process.
    </para>
 
    <para>
     A first approach to implement protections against concurrent
     accesses to the same data may be the locking of critical
-    rows. There are two main categories of such techniques:
+    rows. Two such techniques are:
     <emphasis>Optimistic Concurrency Control</emphasis> (OCC)
     and <emphasis>Two Phase Locking</emphasis> (2PL).
-    <productname>PostgreSQL</productname> implements the more
-    sophisticated technique <firstterm>Multiversion Concurrency
+    <productname>PostgreSQL</productname> implements a third, more
+    sophisticated technique: <firstterm>Multiversion Concurrency
     Control</firstterm> (MVCC). The crucial advantage of MVCC
     over other technologies gets evident in multiuser OLTP
     environments with a massive number of concurrent write
@@ -493,15 +493,15 @@
 
    <para>
     Instead of locking rows, the <firstterm>MVCC</firstterm> technique creates
-    a new version of the same row when any data-change takes place. To
-    distinguish between these versions as well as to track the timeline
+    a new version of the row when a data-change takes place. To
+    distinguish between these two versions and to track the timeline
     of the row, each of the versions contains, in addition to their user-defined
     columns, two special system columns, which are not visible
     for the usual <command>SELECT * FROM ...</command> command.
     The column <literal>xmin</literal> contains the transaction ID (xid)
-    of the transaction, which creates this version of the row. Accordingly,
+    of the transaction, which created this version of the row. Accordingly,
     <literal>xmax</literal> contains the xid of the transaction, which has
-    deleted this version, respectively a zero, if the version is not
+    deleted this version, or zero, if the version is not
     deleted. You can read both with the command
     <command>SELECT xmin, xmax, * FROM ... </command>.
    </para>
@@ -519,9 +519,9 @@
    </para>
 
    <para>
-    Please note that the description in this chapter simplifies the situation
-    by omitting details. When many transactions are running simultaneously,
-    things can get very complicated. Sometimes they get aborted via
+    The description in this chapter simplifies by omitting detail.
+    When many transactions are running simultaneously,
+    things can get complicated. Sometimes transactions get aborted via
     ROLLBACK immediately or after a lot of other activities, sometimes
     a single row is involved in more than one transaction, sometimes
     a client crashes, sometimes the sequence of xids restarts
@@ -567,7 +567,7 @@
     changing the user data from <literal>'x'</literal> to
     <literal>'y'</literal>. According to the MVCC principles,
     the data in the old version of the row does not change!
-    The value <literal>'x'</literal> keeps as it was before.
+    The value <literal>'x'</literal> remains as it was before.
     Only <literal>xmax</literal> changes to <literal>135</literal>.
     Now, this version is treated as valid exclusively for
     transactions with xids from <literal>123</literal> to
@@ -591,7 +591,7 @@
 
    <para>
     Finally, a row may be deleted by a <command>DELETE</command>
-    command. Even in this case, all versions of the row keep as
+    command. Even in this case, all versions of the row remain as
     before. Nothing is thrown away so far! Only <literal>xmax</literal>
     of the last version changes to the xid of the <command>DELETE</command>
     transaction, which indicates that it is only valid for
@@ -603,14 +603,14 @@
    <para>
     In summary, the MVCC technology creates more and more versions
     of the same row in the table's heap file and leaves them there,
-    even with a <command>DELETE</command> command. The youngest
+    even after a <command>DELETE</command> command. Only the youngest
     version is relevant for all future transactions. But the
     system must also preserve some of the older ones for a
-    certain amount of time because the possiblility exists that
-    they are or could become relevant for any of the pending
+    certain amount of time because the possibility exists that
+    they are or could become relevant for any pending
     transactions. Over time, also the older ones get out of scope
     for ALL transactions and therefore become unnecessary.
-    Nevertheless, they exist physically on the disk and occupy
+    Nevertheless, they do exist physically on the disk and occupy
    space.
    </para>
 
@@ -629,7 +629,7 @@
       xids grow, old row versions get out of scope over time.
       If an old row version is no longer valid for ALL existing
       transactions, it's called <firstterm>dead</firstterm>. The
-      space occupied by the sum of all dead row versions is
+      space occupied by all dead row versions is
       called <firstterm>bloat</firstterm>.
      </simpara>
     </listitem>
@@ -637,7 +637,7 @@
     <listitem>
      <simpara>
       Internally, an <command>UPDATE</command> command acts in the
-      same way as a <command>DELETE</command> command, followed by
+      same way as a <command>DELETE</command> command followed by
       an <command>INSERT</command> command.
      </simpara>
     </listitem>
@@ -646,7 +646,7 @@
      <simpara>
       Nothing gets wiped away &mdash; with the consequence that the database
       occupies more and more disk space. It is obvious that
-      this behavior has to be automatically corrected in some
+      this behavior has to be corrected in some
       way. The next chapter explains how AUTOVACUUM fulfills
       this task.
      </simpara>
@@ -664,7 +664,7 @@
     more and more disk space, the <firstterm>bloat</firstterm>.
     This chapter explains how the SQL command
     <firstterm>VACUUM</firstterm> and the automatically running
-    <firstterm>AUTOVACUUM</firstterm> processes clear the situation
+    <firstterm>AUTOVACUUM</firstterm> processes clean up
     by eliminating <firstterm>bloat</firstterm>.
    </para>
 
@@ -672,9 +672,9 @@
     <para>
      <firstterm>AUTOVACUUM</firstterm> runs automatically by
      default. Its default parameters as well as such for
-     <firstterm>VACUUM</firstterm> fits well for most standard
+     <firstterm>VACUUM</firstterm> fit well for most standard
      situations. Therefore a novice database manager can
-     easily skip the rest of this chapter, which explains
+     easily skip the rest of this chapter which explains
      a lot of details.
     </para>
    </note>
@@ -682,16 +682,16 @@
    <para>
     Client processes can issue the SQL command VACUUM at arbitrary
     points in time. DBAs do this when they recognize special situations,
-    or they start it in batch jobs, which run periodically.
+    or they start it in batch jobs which run periodically.
     AUTOVACUUM processes run as part of the
     <link linkend="glossary-instance">Instance</link> at the server.
-    There is a constantly running AUTOVACUUM daemon. He permanently
+    There is a constantly running AUTOVACUUM daemon. It permanently
     controls the state of all databases based on values that are
     collected by the <link linkend="glossary-stats-collector">
     Statistics Collector</link> and starts
-    AUTOVACUUM processes whenever he detects certain situations.
+    AUTOVACUUM processes whenever it detects certain situations.
     Thus, it's a dynamic behavior of <productname>PostgreSQL</productname>
-    with the intention to tidy up &mdash; not always, but whenever it
+    with the intention to tidy up &mdash; whenever it
     is appropriate.
    </para>
 
@@ -712,7 +712,7 @@
       <firstterm>Freeze</firstterm>: Mark the youngest row version
       as frozen. This means that the version
       is always treated as valid (visible) independent from
-      the <firstterm>wraparound problematic</firstterm> (see below).
+      the <firstterm>wraparound problem</firstterm> (see below).
      </simpara>
     </listitem>
 
@@ -729,22 +729,22 @@
      <simpara>
       <emphasis>Statistics</emphasis>: Collect statistics about the
       number of rows per table, the distribution of values, and so on,
-      as the basis for query planner's decision making.
+      as the basis for decisions of the query planner.
      </simpara>
     </listitem>
 
    </itemizedlist>
 
    <para>
-    The eagerness &mdash; you can call it 'aggressivity' &mdash; of the
+    The eagerness &mdash; you can call it 'aggression' &mdash; of the
     operations <emphasis>eliminating bloat</emphasis> and
     <emphasis>freeze</emphasis> is controlled by configuration
     parameters, runtime flags, and in extreme situations by
-    themselves. Because vacuum operations typically are I/O
+    the processes themselves. Because vacuum operations typically are I/O
     intensive, which can hinder other activities, AUTOVACUUM
     avoids performing many vacuum operations in bulk. Instead,
     it carries out many small actions with time gaps in between.
-    The SQL command VACUUM runs immediately without any
+    The SQL command VACUUM runs immediately and without any
     time gaps.
    </para>
 
@@ -794,8 +794,8 @@
     After the vacuum operation detects a superfluous row version, it
     marks its space as free for future use of writing
     actions. Only in rare situations (or in the case of VACUUM FULL),
-    this space is released to the operating system. In most cases,
-    it keeps occupied by PostgreSQL and will be used by future
+    is this space released to the operating system. In most cases,
+    it remains occupied by PostgreSQL and will be used by future
     <command>INSERT</command> or <command>UPDATE</command>
     commands concerning this row or a completely different one.
    </para>
@@ -827,7 +827,7 @@
      <listitem>
       <simpara>
        When a client issues the SQL command VACUUM with the option FULL.
-       Also, in this mode, the bloat disappears, but the used strategy
+       Also, in this mode, the bloat disappears, but the strategy used
        is very different: In this case, the complete table is copied
        to a different file skipping all outdated row versions. This
        leads to a significant reduction of used disk space because
@@ -839,8 +839,8 @@
      <listitem>
       <simpara>
        When an AUTOVACUUM process acts. For optimization
-       purposes, he considers the Visibility Map in the same way as
-       VACUUM. Additionally, he ignores tables with few modifications;
+       purposes, it considers the Visibility Map in the same way as
+       VACUUM. Additionally, it ignores tables with few modifications;
        see <xref linkend="guc-autovacuum-vacuum-threshold"/>,
        which defaults to 50 rows and
        <xref linkend="guc-autovacuum-vacuum-scale-factor"/>,
@@ -864,7 +864,7 @@
     a certain number of new transactions they are forced to restart
     from the beginning, which is called <firstterm>wraparound</firstterm>.
     Therefore the terms 'old transaction' / 'young transaction' does
-    not always correlate with low / hight values of xids. Near to the
+    not always correlate with low / high values of xids. Near to the
     wraparound point, there are cases where xmin has a higher value
     than xmax, although their meaning is said to be older than xmax.
    </para>
@@ -891,10 +891,10 @@
     The use of a limited range of IDs for transactions leads
     to the necessity to restart the sequence sooner or later.
     This does not only have the rare consequence previously
-    described that sometimes xmin is huger than xmax. The far
+    described that sometimes xmin is higher than xmax. The far
     more critical problem is that whenever the system has
     to evaluate a WHERE condition, it must decide which row
-    versions are valid (visible) from the perspective of the
+    version is valid (visible) from the perspective of the
     transaction of this query. If a wraparound couldn't happen,
     this decision would be relatively easy: the xid
     must be between xmin and xmax, and the corresponding
@@ -911,7 +911,7 @@
 
     <listitem>
      <simpara>
-      In a first step, PostgreSQL divides the complete range of
+      As a first step, PostgreSQL divides the complete range of
       possible xids into two halves with the two split-points
       'txid_current' and 'txid_current + 2^31'. The half behind
       'txid_current' is considered to represent xids of the
@@ -975,13 +975,13 @@
    </para>
 
    <para>
-    At what point in time the freeze operation will take place?
+    At what point in time does the freeze operation take place?
 
     <itemizedlist>
      <listitem>
       <simpara>
        When a client issues the SQL command VACUUM with its
-       FREEZE option. In this case, all such pages are
+       FREEZE option. In this case, all pages are
        processed that are marked in the Visibility Map
        to potentially have unfrozen rows.
       </simpara>
@@ -989,12 +989,12 @@
      <listitem>
       <simpara>
        When a client issues the SQL command VACUUM without any
-       option but finds that there are xids older than
+       options but finds that there are xids older than
        <xref linkend="guc-vacuum-freeze-table-age"/>
        (default: 150 million) minus
        <xref linkend="guc-vacuum-freeze-min-age"/>
        (default: 50 million).
-       As before, all such pages are processed that are
+       As before, all pages are processed that are
        marked in the Visibility Map to potentially have unfrozen
        rows.
       </simpara>
@@ -1008,11 +1008,11 @@
       <itemizedlist>
        <listitem>
         <simpara>
-         In the <emphasis>normal mode</emphasis>, he skips
+         In the <emphasis>normal mode</emphasis>, it skips
          pages with row versions that are younger than
          <xref linkend="guc-vacuum-freeze-min-age"/>
          (default: 50 million) and works only on pages where
-         all xids are older. The skipping of jung xids prevents
+         all xids are older. The skipping of young xids prevents
          work on such pages, which are likely to be changed
          by one of the future SQL commands.
         </simpara>
@@ -1020,7 +1020,7 @@
        <listitem>
         <simpara>
          The process switches
-         to an <emphasis>aggressive mode</emphasis> if he recognizes
+         to an <emphasis>aggressive mode</emphasis> if it recognizes
          that for the processed table their oldest xid exceeds
          <xref linkend="guc-autovacuum-freeze-max-age"/>
          (default: 200 million). The value of the oldest unfrozen
@@ -1038,12 +1038,12 @@
 
    <para>
     In the first two cases and with autovacuum in
-    <emphasis>aggressive mode</emphasis>, the system knowns
+    <emphasis>aggressive mode</emphasis>, the system knows
     to which value the oldest unfrozen xid has moved forward and
     logs the value in <emphasis>pg_class.relfrozenxid</emphasis>.
     The distance between this value and the 'txid_current' split
     point becomes smaller, and the distance to 'txid_current + 2^31'
-    larger than before.
+    becomes larger than before.
    </para>
 
    <figure id="tutorial-freeze-figure">
@@ -1069,7 +1069,7 @@
     running <firstterm>autovacuum daemon</firstterm>. If the
     daemon detects that for a table <firstterm>
     autovacuum_freeze_max_age</firstterm> is exceeded, it starts
-    an AUTOVACUUM process in the <emphasis>aggressive mode</emphasis>
+    an AUTOVACUUM process in <emphasis>aggressive mode</emphasis>
     (see above) &mdash; even if AUTOVACUUM is disabled.
    </para>
 
@@ -1079,8 +1079,8 @@
     The <link linkend="glossary-vm">Visibility Map</link>
     (VM) contains two flags &mdash; stored as
     two bits &mdash; for each page of the heap. If the first bit
-    is set, it indicates that the associated page does not
-    contain any bloat. If the second one is set, it indicates
+    is set, that indicates that the associated page does not
+    contain any bloat. If the second one is set, that indicates
     that the page contains only frozen rows.
    </para>
 
@@ -1099,7 +1099,7 @@
    <para>
     The setting of the flags is silently done by VACUUM
     and AUTOVACUUM during their bloat and freeze operations.
-    This is done to accelerate future vacuum actions,
+    This is done to speed up future vacuum actions,
     regular accesses to heap pages, and some accesses to
     the index. Every data-modifying operation on any row
     version of the page clears the flags.
@@ -1122,7 +1122,7 @@
     linkend="planner-stats">Query Planner</link> to make optimal
     decisions for the generation of execution plans. This
     information can be gathered with the SQL commands ANALYZE
-    or VACUUM ANALYZE. But also autovacuum processes gather
+    or VACUUM ANALYZE. But autovacuum processes also gather
     such information. Depending on the percentage of changed rows
     per table <xref linkend="guc-autovacuum-analyze-scale-factor"/>,
     the autovacuum daemon starts autovacuum processes to collect
@@ -1144,7 +1144,7 @@
     <link linkend="tutorial-transactions">Transactions</link>
     are a fundamental concept of relational database systems.
     Their essential point is that they bundle multiple
-    read- or write-operations into a single, all-or-nothing
+    read- or write-operations into a single all-or-nothing
     operation. Furthermore, they separate and protect concurrent
     actions of different connections from each other. Thereby
     they implement the ACID paradigm.
@@ -1152,7 +1152,7 @@
 
    <para>
     In <productname>PostgreSQL</productname> there are two ways
-    to establish a transaction. The explicite way uses the keywords
+    to establish a transaction. The explicit way uses the keywords
     <link linkend="sql-begin">BEGIN</link> and
     <link linkend="sql-commit">COMMIT</link> (respectively
     <link linkend="sql-rollback">ROLLBACK</link>) before
@@ -1188,9 +1188,9 @@
    </para>
 
    <para>
-    The atomicity also affects the visibility of changes. All
+    The atomicity also affects the visibility of changes. No
     connections running simultaneously to a data modifying
-    transaction will never see any change before the
+    transaction will ever see any change before the
     transaction successfully executes a <command>COMMIT</command>
     &mdash; even in the lowest
     <link linkend="transaction-iso">isolation level</link>
@@ -1221,7 +1221,7 @@
    </para>
 
    <para>
-    <productname>PostgreSQL</productname> overcomes the
+    <productname>PostgreSQL</productname> overcomes this
     problem by showing only such row versions to other
     transactions whose originating transaction is
     successfully committed. It skips all row versions of
@@ -1229,9 +1229,9 @@
     <productname>PostgreSQL</productname> solves one more
     problem. Even the single <command>COMMIT</command>
     command needs a short time interval for its execution.
-    Therefor its critical 'dead-or-survival' phase
+    Therefore its critical 'dead-or-survival' phase
     runs in a priviledged mode where it cannot be
-    interupted by other processes.
+    interrupted by other processes.
    </para>
 
    <bridgehead renderas="sect2">What are the benefits?</bridgehead>
@@ -1247,10 +1247,10 @@
     the transfers of some money from one account to another.
     It is obvious
     that the decrease of the one and the increase of the
-    other are impartible. Nevertheless, there is no particular
+    other must be indivisible. Nevertheless, there is no particular
     need for an application to do something to ensure the
     <glossterm linkend="glossary-atomicity">atomicity</glossterm>
-    of its behavior. It's enough to surround them with
+    of this behavior. It's enough to surround them with
     <command>BEGIN</command> and <command>COMMIT</command>.
    </para>
 
@@ -1260,10 +1260,10 @@
     conditions. In such cases, the application simply issues a
     <command>ROLLBACK</command> command instead of a
     <command>COMMIT</command>. The <command>ROLLBACK</command>
-    cancels the transaction, and all changes made so far retain
+    cancels the transaction, and all changes made so far remain
     invisible forever; it's like they never happened. There
     is no need for the application to log its activities and
-    undo every single step.
+    undo every step of the transaction separately.
    </para>
 
    <para>
@@ -1282,7 +1282,7 @@
    <para>
     Also, all self-evident &mdash; but possibly not obvious
     &mdash; low-level demands on the database system are
-    ensured; e.g., index entries for rows must become
+    ensured; e.g. index entries for rows must become
     visible at the same moment as the rows themselves.
    </para>
 
@@ -1307,7 +1307,7 @@
 
    <para>
     Nothing is perfect and failures inevitably happen.
-    However, the most common types of failures are
+    However, the most common types of failure are
     well known and <productname>PostgreSQL</productname>
     implements strategies to overcome them.
     Such strategies use parts of the previously presented
@@ -1356,7 +1356,7 @@
     actions. The WAL records are written first. Second,
     the data itself shall exist in the heap and index files.
     In opposite to the WAL records, this part may or may
-    not be transferred entirely from shared buffers to the files.
+    not have been transferred entirely from shared buffers (=RAM) to the files.
    </para>
    <para>
     The automatic recovery searches within the WAL files for
@@ -1378,7 +1378,7 @@
 
    <bridgehead renderas="sect3">Disk crash</bridgehead>
    <para>
-    If a disk crashes, the course of actions described previously
+    If a disk crashes, the course of action described previously
     cannot work. It is likely that the WAL files and/or the
     data and index files are no longer available. You need
     to take special actions to overcome such situations.
@@ -1453,14 +1453,14 @@
     <link linkend="backup-file">copy</link>
     of the cluster's directory structure and files. In
     case of severe problems such a copy can serve as
-    the source of a recovery. But in order to get a
+    the source of recovery. But in order to get a
     <emphasis>USABLE</emphasis> backup by this method,
     the database server <emphasis>MUST</emphasis> be
     shut down during the complete runtime of the copy
     command!
    </para>
    <para>
-    The apparent disadvantage of this method is that there
+    The obvious disadvantage of this method is that there
     is a downtime where no user interaction is possible.
    </para>
 
@@ -1515,7 +1515,7 @@
     If configured, the
     <glossterm linkend="glossary-wal-archiver">Archiver process</glossterm>
     will automatically copy every single WAL file to a save location.
-    <link linkend="backup-archiving-wal">It's configuration</link>
+    <link linkend="backup-archiving-wal">Its configuration</link>
     consists mainly of a string, which contains a copy command
     in the operating system's syntax. In order to protect your
     data against a disk crash, the destination location

Re: Additional Chapter for Tutorial

Reply via email to