Cleanup of syntax.sgml

Joshua Drake Fri, 20 Jun 2025 12:33:38 -0700

To make it more consumable.

--


   - Founder - https://commandprompt.com/ - 24x7x365 Postgres since 1997
   - Founder and Co-Chair - https://postgresconf.org/
   - Founder - https://postgresql.us - United States PostgreSQL
   - Public speaker, published author, postgresql expert, and people
   believer.
   - Host - More than a refresh
   <https://commandprompt.com/about/more-than-a-refresh/>: A podcast about
   data and the people who wrangle it.

diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 916189a7d68..900f8f8e441 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -15,10 +15,10 @@
  </para>
 
  <para>
-  We also advise users who are already familiar with SQL to read this
-  chapter carefully because it contains several rules and concepts that
-  are implemented inconsistently among SQL databases or that are
-  specific to <productname>PostgreSQL</productname>.
+  We advise users who to read this chapter carefully because it 
+  contains several rules and concepts that are implemented 
+  inconsistently among SQL databases or that are specific to 
+  <productname>PostgreSQL</productname>.
  </para>
 
  <sect1 id="sql-syntax-lexical">
@@ -29,53 +29,48 @@
   </indexterm>
 
   <para>
-   SQL input consists of a sequence of
+   SQL consists of a sequence of
    <firstterm>commands</firstterm>.  A command is composed of a
    sequence of <firstterm>tokens</firstterm>, terminated by a
-   semicolon (<quote>;</quote>).  The end of the input stream also
-   terminates a command.  Which tokens are valid depends on the syntax
-   of the particular command.
+   semicolon (<quote>;</quote>). Which tokens are valid depends on 
+   the syntax of the particular command.
   </para>
 
   <para>
    A token can be a <firstterm>key word</firstterm>, an
    <firstterm>identifier</firstterm>, a <firstterm>quoted
    identifier</firstterm>, a <firstterm>literal</firstterm> (or
-   constant), or a special character symbol.  Tokens are normally
-   separated by whitespace (space, tab, newline), but need not be if
-   there is no ambiguity (which is generally only the case if a
-   special character is adjacent to some other token type).
+   constant), or a special character symbol.
   </para>
 
    <para>
-    For example, the following is (syntactically) valid SQL input:
+    The following is (syntactically) valid SQL input:
 <programlisting>
 SELECT * FROM MY_TABLE;
 UPDATE MY_TABLE SET A = 5;
 INSERT INTO MY_TABLE VALUES (3, 'hi there');
 </programlisting>
-    This is a sequence of three commands, one per line (although this
-    is not required; more than one command can be on a line, and
-    commands can usefully be split across lines).
+    This is a sequence of three commands, one per line. More
+    than one command can be on a line, and commands can also be split 
+    across lines.
    </para>
 
   <para>
-   Additionally, <firstterm>comments</firstterm> can occur in SQL
-   input.  They are not tokens, they are effectively equivalent to
-   whitespace.
+   Additionally, <firstterm>comments</firstterm> are not tokens and 
+   can occur in SQL input.
   </para>
 
   <para>
-   The SQL syntax is not very consistent regarding what tokens
-   identify commands and which are operands or parameters.  The first
-   few tokens are generally the command name, so in the above example
-   we would usually speak of a <quote>SELECT</quote>, an
-   <quote>UPDATE</quote>, and an <quote>INSERT</quote> command.  But
-   for instance the <command>UPDATE</command> command always requires
-   a <token>SET</token> token to appear in a certain position, and
-   this particular variation of <command>INSERT</command> also
-   requires a <token>VALUES</token> in order to be complete.  The
-   precise syntax rules for each command are described in <xref linkend="reference"/>.
+   SQL is not consistent regarding what tokens identify commands and 
+   which are operands or parameters.  The first few tokens are 
+   generally the command name, so in the above example we would 
+   speak of a <quote>SELECT</quote>, an <quote>UPDATE</quote>, and 
+   an <quote>INSERT</quote> command.  But for the 
+   <command>UPDATE</command> command always requires a <token>SET</token> 
+   token to appear in a certain position, and this particular 
+   variation of <command>INSERT</command> also requires a 
+   <token>VALUES</token> in order to be complete.  The precise syntax 
+   rules for each command are described in <xref linkend="reference"/>.
   </para>
 
   <sect2 id="sql-syntax-identifiers">
@@ -98,11 +93,10 @@ INSERT INTO MY_TABLE VALUES (3, 'hi there');
 
    <para>
     Tokens such as <token>SELECT</token>, <token>UPDATE</token>, or
-    <token>VALUES</token> in the example above are examples of
-    <firstterm>key words</firstterm>, that is, words that have a fixed
-    meaning in the SQL language.  The tokens <token>MY_TABLE</token>
-    and <token>A</token> are examples of
-    <firstterm>identifiers</firstterm>.  They identify names of
+    <token>VALUES</token> in the example above are 
+    <firstterm>key words</firstterm> and they have a fixed meaning in 
+    SQL.  The tokens <token>MY_TABLE</token> and <token>A</token> are
+    examples of <firstterm>identifiers</firstterm>.  They identify names of
     tables, columns, or other database objects, depending on the
     command they are used in.  Therefore they are sometimes simply
     called <quote>names</quote>.  Key words and identifiers have the
@@ -119,24 +113,12 @@ INSERT INTO MY_TABLE VALUES (3, 'hi there');
     (<literal>_</literal>).  Subsequent characters in an identifier or
     key word can be letters, underscores, digits
     (<literal>0</literal>-<literal>9</literal>), or dollar signs
-    (<literal>$</literal>).  Note that dollar signs are not allowed in identifiers
-    according to the letter of the SQL standard, so their use might render
-    applications less portable.
-    The SQL standard will not define a key word that contains
-    digits or starts or ends with an underscore, so identifiers of this
-    form are safe against possible conflict with future extensions of the
-    standard.
-   </para>
-
-   <para>
-    <indexterm><primary>identifier</primary><secondary>length</secondary></indexterm>
-    The system uses no more than <symbol>NAMEDATALEN</symbol>-1
-    bytes of an identifier; longer names can be written in
-    commands, but they will be truncated.  By default,
-    <symbol>NAMEDATALEN</symbol> is 64 so the maximum identifier
-    length is 63 bytes. If this limit is problematic, it can be raised by
-    changing the <symbol>NAMEDATALEN</symbol> constant in
-    <filename>src/include/pg_config_manual.h</filename>.
+    (<literal>$</literal>).  Note that dollar signs are not allowed in 
+    identifiers according to the letter of the SQL standard, so their use 
+    might render applications less portable. The SQL standard will not 
+    define a key word that contains digits or starts or ends with an 
+    underscore, so identifiers of this form are safe against possible 
+    conflict with future extensions of the standard.
    </para>
 
    <para>
@@ -152,7 +134,7 @@ UPDATE MY_TABLE SET A = 5;
 <programlisting>
 uPDaTE my_TabLE SeT a = 5;
 </programlisting>
-    A convention often used is to write key words in upper
+    The recommened convention is to write key words in upper
     case and names in lower case, e.g.:
 <programlisting>
 UPDATE my_table SET a = 5;
@@ -183,27 +165,25 @@ UPDATE "my_table" SET "a" = 5;
 
    <para>
     Quoted identifiers can contain any character, except the character
-    with code zero.  (To include a double quote, write two double quotes.)
-    This allows constructing table or column names that would
-    otherwise not be possible, such as ones containing spaces or
-    ampersands.  The length limitation still applies.
+    with code zero. This allows constructing table or column names that 
+    would otherwise not be possible, such as ones containing spaces or
+    ampersands.
    </para>
 
    <para>
-    Quoting an identifier also makes it case-sensitive, whereas
-    unquoted names are always folded to lower case.  For example, the
-    identifiers <literal>FOO</literal>, <literal>foo</literal>, and
+    Quoting an identifier makes it case-sensitive, whereas unquoted names 
+    are always folded to lower case.  For example, the identifiers 
+    <literal>FOO</literal>, <literal>foo</literal>, and 
     <literal>"foo"</literal> are considered the same by
-    <productname>PostgreSQL</productname>, but
-    <literal>"Foo"</literal> and <literal>"FOO"</literal> are
-    different from these three and each other.  (The folding of
-    unquoted names to lower case in <productname>PostgreSQL</productname> is
-    incompatible with the SQL standard, which says that unquoted names
-    should be folded to upper case.  Thus, <literal>foo</literal>
-    should be equivalent to <literal>"FOO"</literal> not
-    <literal>"foo"</literal> according to the standard.  If you want
-    to write portable applications you are advised to always quote a
-    particular name or never quote it.)
+    <productname>PostgreSQL</productname>, but <literal>"Foo"</literal> 
+    and <literal>"FOO"</literal> are not considered the same.
+   </para>
+   <para>
+    The folding of unquoted names to lower case in is incompatible with the 
+    SQL standard.  Thus, <literal>foo</literal> should be equivalent to 
+    <literal>"FOO"</literal> not <literal>"foo"</literal> according to the 
+    standard.  If you want to write portable applications you are advised to 
+    always quote a particular name or never quote it.
    </para>
 
    <indexterm>
@@ -212,20 +192,16 @@ UPDATE "my_table" SET "a" = 5;
    </indexterm>
 
    <para>
-    A variant of quoted
-    identifiers allows including escaped Unicode characters identified
-    by their code points.  This variant starts
-    with <literal>U&amp;</literal> (upper or lower case U followed by
+    A variant of quoted identifiers allows including escaped Unicode 
+    characters identified by their code points.  This variant starts 
+    with <literal>U&amp;</literal> (upper or lower case U followed by 
     ampersand) immediately before the opening double quote, without
     any spaces in between, for example <literal>U&amp;"foo"</literal>.
-    (Note that this creates an ambiguity with the
-    operator <literal>&amp;</literal>.  Use spaces around the operator to
-    avoid this problem.)  Inside the quotes, Unicode characters can be
-    specified in escaped form by writing a backslash followed by the
-    four-digit hexadecimal code point number or alternatively a
-    backslash followed by a plus sign followed by a six-digit
-    hexadecimal code point number.  For example, the
-    identifier <literal>"data"</literal> could be written as
+    Inside the quotes, Unicode characters can be specified in escaped 
+    form by writing a backslash followed by the four-digit hexadecimal 
+    code point number or alternatively a backslash followed by a plus 
+    sign followed by a six-digit hexadecimal code point number.  For 
+    example, the identifier <literal>"data"</literal> could be written as
 <programlisting>
 U&amp;"d\0061t\+000061"
 </programlisting>
@@ -330,9 +306,8 @@ SELECT 'foobar';
 <programlisting>
 SELECT 'foo'      'bar';
 </programlisting>
-     is not valid syntax.  (This slightly bizarre behavior is specified
-     by <acronym>SQL</acronym>; <productname>PostgreSQL</productname> is
-     following the standard.)
+     is not valid syntax.  This behavior is specified by
+     the <acronym>SQL</acronym> standard. 
     </para>
    </sect3>
 
@@ -351,9 +326,9 @@ SELECT 'foo'      'bar';
      string constants, which are an extension to the SQL standard.
      An escape string constant is specified by writing the letter
      <literal>E</literal> (upper or lower case) just before the opening single
-     quote, e.g., <literal>E'foo'</literal>.  (When continuing an escape string
+     quote, e.g., <literal>E'foo'</literal>.  When continuing an escape string
      constant across lines, write <literal>E</literal> only before the first opening
-     quote.)
+     quote.
      Within an escape string, a backslash character (<literal>\</literal>) begins a
      C-like <firstterm>backslash escape</firstterm> sequence, in which the combination
      of backslash and following character(s) represent a special byte
@@ -432,37 +407,9 @@ SELECT 'foo'      'bar';
      It is your responsibility that the byte sequences you create,
      especially when using the octal or hexadecimal escapes, compose
      valid characters in the server character set encoding.
-     A useful alternative is to use Unicode escapes or the
-     alternative Unicode escape syntax, explained
-     in <xref linkend="sql-syntax-strings-uescape"/>; then the server
+     If you can use Unicode escapes or the alternative Unicode escape syntax, 
+     explained in <xref linkend="sql-syntax-strings-uescape"/>; then the server
      will check that the character conversion is possible.
-    </para>
-
-    <caution>
-    <para>
-     If the configuration parameter
-     <xref linkend="guc-standard-conforming-strings"/> is <literal>off</literal>,
-     then <productname>PostgreSQL</productname> recognizes backslash escapes
-     in both regular and escape string constants.  However, as of
-     <productname>PostgreSQL</productname> 9.1, the default is <literal>on</literal>, meaning
-     that backslash escapes are recognized only in escape string constants.
-     This behavior is more standards-compliant, but might break applications
-     which rely on the historical behavior, where backslash escapes
-     were always recognized.  As a workaround, you can set this parameter
-     to <literal>off</literal>, but it is better to migrate away from using backslash
-     escapes.  If you need to use a backslash escape to represent a special
-     character, write the string constant with an <literal>E</literal>.
-    </para>
-
-    <para>
-     In addition to <varname>standard_conforming_strings</varname>, the configuration
-     parameters <xref linkend="guc-escape-string-warning"/> and
-     <xref linkend="guc-backslash-quote"/> govern treatment of backslashes
-     in string constants.
-    </para>
-    </caution>
-
-    <para>
      The character with the code zero cannot be in a string constant.
     </para>
    </sect3>
@@ -482,9 +429,9 @@ SELECT 'foo'      'bar';
      constant starts with <literal>U&amp;</literal> (upper or lower case
      letter U followed by ampersand) immediately before the opening
      quote, without any spaces in between, for
-     example <literal>U&amp;'foo'</literal>.  (Note that this creates an
+     example <literal>U&amp;'foo'</literal>.  This creates an
      ambiguity with the operator <literal>&amp;</literal>.  Use spaces
-     around the operator to avoid this problem.)  Inside the quotes,
+     around the operator to avoid this problem.  Inside the quotes,
      Unicode characters can be specified in escaped form by writing a
      backslash followed by the four-digit hexadecimal code point
      number or alternatively a backslash followed by a plus sign
@@ -493,7 +440,7 @@ SELECT 'foo'      'bar';
 <programlisting>
 U&amp;'d\0061t\+000061'
 </programlisting>
-     The following less trivial example writes the Russian
+     The following example writes the Russian
      word <quote>slon</quote> (elephant) in Cyrillic letters:
 <programlisting>
 U&amp;'\0441\043B\043E\043D'
@@ -532,17 +479,6 @@ U&amp;'d!0061t!+000061' UESCAPE '!'
      by one of these escape sequences is converted to the actual server
      encoding; an error is reported if that's not possible.
     </para>
-
-    <para>
-     Also, the Unicode escape syntax for string constants only works
-     when the configuration
-     parameter <xref linkend="guc-standard-conforming-strings"/> is
-     turned on.  This is because otherwise this syntax could confuse
-     clients that parse the SQL statements to the point that it could
-     lead to SQL injections and similar security issues.  If the
-     parameter is set to off, this syntax will be rejected with an
-     error message.
-    </para>
    </sect3>
 
    <sect3 id="sql-syntax-dollar-quoting">
@@ -555,24 +491,22 @@ U&amp;'d!0061t!+000061' UESCAPE '!'
     <para>
      While the standard syntax for specifying string constants is usually
      convenient, it can be difficult to understand when the desired string
-     contains many single quotes, since each of those must
-     be doubled. To allow more readable queries in such situations,
-     <productname>PostgreSQL</productname> provides another way, called
-     <quote>dollar quoting</quote>, to write string constants.
-     A dollar-quoted string constant
-     consists of a dollar sign (<literal>$</literal>), an optional
-     <quote>tag</quote> of zero or more characters, another dollar
-     sign, an arbitrary sequence of characters that makes up the
-     string content, a dollar sign, the same tag that began this
-     dollar quote, and a dollar sign. For example, here are two
-     different ways to specify the string <quote>Dianne's horse</quote>
-     using dollar quoting:
+     contains many single quotes or backslashes. To allow more readable 
+     queries in such situations, <productname>PostgreSQL</productname> 
+     provides <quote>dollar quoting</quote> to write string constants.
+     A dollar-quoted string constant consists of a dollar sign 
+     (<literal>$</literal>), an optional <quote>tag</quote> of zero or 
+     more characters, another dollar sign, an arbitrary sequence of 
+     characters that makes up the string content, a dollar sign, the same 
+     tag that began this dollar quote, and a dollar sign. For example, here 
+     are two different ways to specify the string 
+     <quote>Dianne's horse</quote> using dollar quoting:
 <programlisting>
 $$Dianne's horse$$
 $SomeTag$Dianne's horse$SomeTag$
 </programlisting>
      Notice that inside the dollar-quoted string, single quotes can be
-     used without needing to be escaped.  Indeed, no characters inside
+     used without needing to be escaped. No characters inside
      a dollar-quoted string are ever escaped: the string content is always
      written literally.  Backslashes are not special, and neither are
      dollar signs, unless they are part of a sequence matching the opening
@@ -590,13 +524,12 @@ BEGIN
 END;
 $function$
 </programlisting>
-     Here, the sequence <literal>$q$[\t\r\n\v\\]$q$</literal> represents a
+     The sequence <literal>$q$[\t\r\n\v\\]$q$</literal> represents a
      dollar-quoted literal string <literal>[\t\r\n\v\\]</literal>, which will
      be recognized when the function body is executed by
-     <productname>PostgreSQL</productname>.  But since the sequence does not match
+     <productname>PostgreSQL</productname>.  Since the sequence does not match
      the outer dollar quoting delimiter <literal>$function$</literal>, it is
-     just some more characters within the constant so far as the outer
-     string is concerned.
+     just some more characters within the constant.
     </para>
 
     <para>
@@ -634,7 +567,7 @@ $function$
     </indexterm>
 
     <para>
-     Bit-string constants look like regular string constants with a
+     Bit-string constants is a string constant with a
      <literal>B</literal> (upper or lower case) immediately before the
      opening quote (no intervening whitespace), e.g.,
      <literal>B'1001'</literal>.  The only characters allowed within
@@ -654,6 +587,7 @@ $function$
      across lines in the same way as regular string constants.
      Dollar quoting cannot be used in a bit-string constant.
     </para>
+
    </sect3>
 
    <sect3 id="sql-syntax-constants-numeric">
@@ -665,7 +599,7 @@ $function$
     </indexterm>
 
     <para>
-     Numeric constants are accepted in these general forms:
+     Numeric constants are accepted in these forms:
 <synopsis>
 <replaceable>digits</replaceable>
 <replaceable>digits</replaceable>.<optional><replaceable>digits</replaceable></optional><optional>e<optional>+-</optional><replaceable>digits</replaceable></optional>
@@ -678,13 +612,12 @@ $function$
      exponent marker (<literal>e</literal>), if one is present.
      There cannot be any spaces or other characters embedded in the
      constant, except for underscores, which can be used for visual grouping as
-     described below.  Note that any leading plus or minus sign is not actually
-     considered part of the constant; it is an operator applied to the
-     constant.
+     described below.  Any leading plus or minus sign is not considered part of 
+     the constant; it is an operator applied to the constant.
     </para>
 
     <para>
-     These are some examples of valid numeric constants:
+     Examples of valid numeric constants:
 <literallayout>
 42
 3.5
@@ -711,7 +644,7 @@ $function$
     </para>
 
     <para>
-     These are some examples of valid non-decimal integer constants:
+     Examples of valid non-decimal integer constants:
 <literallayout>
 0b100101
 0B10011001
@@ -724,7 +657,7 @@ $function$
 
     <para>
      For visual grouping, underscores can be inserted between digits.  These
-     have no further effect on the value of the constant.  For example:
+     have no effect on the value of the constant.  For example:
 <literallayout>
 1_500_000_000
 0b10001000_00000000
@@ -743,7 +676,7 @@ $function$
      <indexterm><primary>bigint</primary></indexterm>
      <indexterm><primary>numeric</primary></indexterm>
      A numeric constant that contains neither a decimal point nor an
-     exponent is initially presumed to be type <type>integer</type> if its
+     exponent is presumed to be type <type>integer</type> if its
      value fits in type <type>integer</type> (32 bits); otherwise it is
      presumed to be type <type>bigint</type> if its
      value fits in type <type>bigint</type> (64 bits); otherwise it is
@@ -767,8 +700,8 @@ REAL '1.23'  -- string style
 1.23::REAL   -- PostgreSQL (historical) style
 </programlisting>
 
-     These are actually just special cases of the general casting
-     notations discussed next.
+     These are special cases of the general casting notations discussed 
+     next.
     </para>
    </sect3>
 
@@ -863,8 +796,9 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
 
      <listitem>
       <para>
-       A multiple-character operator name cannot end in <literal>+</literal> or <literal>-</literal>,
-       unless the name also contains at least one of these characters:
+       A multiple-character operator name cannot end in <literal>+</literal> 
+       or <literal>-</literal>, unless the name also contains at least one 
+       of these characters:
 <literallayout>
 ~ ! @ # % ^ &amp; | ` ?
 </literallayout>
@@ -878,7 +812,7 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
    </para>
 
    <para>
-    When working with non-SQL-standard operator names, you will usually
+    When working with non-SQL-standard operator names, you will
     need to separate adjacent operators with spaces to avoid ambiguity.
     For example, if you have defined a prefix operator named <literal>@</literal>,
     you cannot write <literal>X*@Y</literal>; you must write
@@ -911,7 +845,7 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
 
     <listitem>
      <para>
-      Parentheses (<literal>()</literal>) have their usual meaning to
+      Parentheses (<literal>()</literal>) are used to
       group expressions and enforce precedence.  In some cases
       parentheses are required as part of the fixed syntax of a
       particular SQL command.
@@ -1003,7 +937,7 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
 
    <para>
     A comment is removed from the input stream before further syntax
-    analysis and is effectively replaced by whitespace.
+    analysis and is replaced by whitespace.
    </para>
   </sect2>
 
@@ -1143,12 +1077,11 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
    </table>
 
    <para>
-    Note that the operator precedence rules also apply to user-defined
+    Operator precedence rules also apply to user-defined
     operators that have the same names as the built-in operators
-    mentioned above.  For example, if you define a
-    <quote>+</quote> operator for some custom data type it will have
-    the same precedence as the built-in <quote>+</quote> operator, no
-    matter what yours does.
+    mentioned above.  If you define a <quote>+</quote> operator 
+    for some custom data type it will have the same precedence as 
+    the built-in <quote>+</quote> operator, no matter what yours does.
    </para>
 
    <para>
@@ -1163,23 +1096,6 @@ SELECT 3 OPERATOR(pg_catalog.+) 4;
     which specific operator appears inside <literal>OPERATOR()</literal>.
    </para>
 
-   <note>
-    <para>
-     <productname>PostgreSQL</productname> versions before 9.5 used slightly different
-     operator precedence rules.  In particular, <token>&lt;=</token>
-     <token>&gt;=</token> and <token>&lt;&gt;</token> used to be treated as
-     generic operators; <literal>IS</literal> tests used to have higher priority;
-     and <literal>NOT BETWEEN</literal> and related constructs acted inconsistently,
-     being taken in some cases as having the precedence of <literal>NOT</literal>
-     rather than <literal>BETWEEN</literal>.  These rules were changed for better
-     compliance with the SQL standard and to reduce confusion from
-     inconsistent treatment of logically equivalent constructs.  In most
-     cases, these changes will result in no behavioral change, or perhaps
-     in <quote>no such operator</quote> failures which can be resolved by adding
-     parentheses.  However there are corner cases in which a query might
-     change behavior without any parsing error being reported.
-    </para>
-   </note>
   </sect2>
  </sect1>

Cleanup of syntax.sgml

Reply via email to