To make it more consumable. --
- Founder - https://commandprompt.com/ - 24x7x365 Postgres since 1997 - Founder and Co-Chair - https://postgresconf.org/ - Founder - https://postgresql.us - United States PostgreSQL - Public speaker, published author, postgresql expert, and people believer. - Host - More than a refresh <https://commandprompt.com/about/more-than-a-refresh/>: A podcast about data and the people who wrangle it.
diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml index 916189a7d68..900f8f8e441 100644 --- a/doc/src/sgml/syntax.sgml +++ b/doc/src/sgml/syntax.sgml @@ -15,10 +15,10 @@ </para> <para> - We also advise users who are already familiar with SQL to read this - chapter carefully because it contains several rules and concepts that - are implemented inconsistently among SQL databases or that are - specific to <productname>PostgreSQL</productname>. + We advise users who to read this chapter carefully because it + contains several rules and concepts that are implemented + inconsistently among SQL databases or that are specific to + <productname>PostgreSQL</productname>. </para> <sect1 id="sql-syntax-lexical"> @@ -29,53 +29,48 @@ </indexterm> <para> - SQL input consists of a sequence of + SQL consists of a sequence of <firstterm>commands</firstterm>. A command is composed of a sequence of <firstterm>tokens</firstterm>, terminated by a - semicolon (<quote>;</quote>). The end of the input stream also - terminates a command. Which tokens are valid depends on the syntax - of the particular command. + semicolon (<quote>;</quote>). Which tokens are valid depends on + the syntax of the particular command. </para> <para> A token can be a <firstterm>key word</firstterm>, an <firstterm>identifier</firstterm>, a <firstterm>quoted identifier</firstterm>, a <firstterm>literal</firstterm> (or - constant), or a special character symbol. Tokens are normally - separated by whitespace (space, tab, newline), but need not be if - there is no ambiguity (which is generally only the case if a - special character is adjacent to some other token type). + constant), or a special character symbol. </para> <para> - For example, the following is (syntactically) valid SQL input: + The following is (syntactically) valid SQL input: <programlisting> SELECT * FROM MY_TABLE; UPDATE MY_TABLE SET A = 5; INSERT INTO MY_TABLE VALUES (3, 'hi there'); </programlisting> - This is a sequence of three commands, one per line (although this - is not required; more than one command can be on a line, and - commands can usefully be split across lines). + This is a sequence of three commands, one per line. More + than one command can be on a line, and commands can also be split + across lines. </para> <para> - Additionally, <firstterm>comments</firstterm> can occur in SQL - input. They are not tokens, they are effectively equivalent to - whitespace. + Additionally, <firstterm>comments</firstterm> are not tokens and + can occur in SQL input. </para> <para> - The SQL syntax is not very consistent regarding what tokens - identify commands and which are operands or parameters. The first - few tokens are generally the command name, so in the above example - we would usually speak of a <quote>SELECT</quote>, an - <quote>UPDATE</quote>, and an <quote>INSERT</quote> command. But - for instance the <command>UPDATE</command> command always requires - a <token>SET</token> token to appear in a certain position, and - this particular variation of <command>INSERT</command> also - requires a <token>VALUES</token> in order to be complete. The - precise syntax rules for each command are described in <xref linkend="reference"/>. + SQL is not consistent regarding what tokens identify commands and + which are operands or parameters. The first few tokens are + generally the command name, so in the above example we would + speak of a <quote>SELECT</quote>, an <quote>UPDATE</quote>, and + an <quote>INSERT</quote> command. But for the + <command>UPDATE</command> command always requires a <token>SET</token> + token to appear in a certain position, and this particular + variation of <command>INSERT</command> also requires a + <token>VALUES</token> in order to be complete. The precise syntax + rules for each command are described in <xref linkend="reference"/>. </para> <sect2 id="sql-syntax-identifiers"> @@ -98,11 +93,10 @@ INSERT INTO MY_TABLE VALUES (3, 'hi there'); <para> Tokens such as <token>SELECT</token>, <token>UPDATE</token>, or - <token>VALUES</token> in the example above are examples of - <firstterm>key words</firstterm>, that is, words that have a fixed - meaning in the SQL language. The tokens <token>MY_TABLE</token> - and <token>A</token> are examples of - <firstterm>identifiers</firstterm>. They identify names of + <token>VALUES</token> in the example above are + <firstterm>key words</firstterm> and they have a fixed meaning in + SQL. The tokens <token>MY_TABLE</token> and <token>A</token> are + examples of <firstterm>identifiers</firstterm>. They identify names of tables, columns, or other database objects, depending on the command they are used in. Therefore they are sometimes simply called <quote>names</quote>. Key words and identifiers have the @@ -119,24 +113,12 @@ INSERT INTO MY_TABLE VALUES (3, 'hi there'); (<literal>_</literal>). Subsequent characters in an identifier or key word can be letters, underscores, digits (<literal>0</literal>-<literal>9</literal>), or dollar signs - (<literal>$</literal>). Note that dollar signs are not allowed in identifiers - according to the letter of the SQL standard, so their use might render - applications less portable. - The SQL standard will not define a key word that contains - digits or starts or ends with an underscore, so identifiers of this - form are safe against possible conflict with future extensions of the - standard. - </para> - - <para> - <indexterm><primary>identifier</primary><secondary>length</secondary></indexterm> - The system uses no more than <symbol>NAMEDATALEN</symbol>-1 - bytes of an identifier; longer names can be written in - commands, but they will be truncated. By default, - <symbol>NAMEDATALEN</symbol> is 64 so the maximum identifier - length is 63 bytes. If this limit is problematic, it can be raised by - changing the <symbol>NAMEDATALEN</symbol> constant in - <filename>src/include/pg_config_manual.h</filename>. + (<literal>$</literal>). Note that dollar signs are not allowed in + identifiers according to the letter of the SQL standard, so their use + might render applications less portable. The SQL standard will not + define a key word that contains digits or starts or ends with an + underscore, so identifiers of this form are safe against possible + conflict with future extensions of the standard. </para> <para> @@ -152,7 +134,7 @@ UPDATE MY_TABLE SET A = 5; <programlisting> uPDaTE my_TabLE SeT a = 5; </programlisting> - A convention often used is to write key words in upper + The recommened convention is to write key words in upper case and names in lower case, e.g.: <programlisting> UPDATE my_table SET a = 5; @@ -183,27 +165,25 @@ UPDATE "my_table" SET "a" = 5; <para> Quoted identifiers can contain any character, except the character - with code zero. (To include a double quote, write two double quotes.) - This allows constructing table or column names that would - otherwise not be possible, such as ones containing spaces or - ampersands. The length limitation still applies. + with code zero. This allows constructing table or column names that + would otherwise not be possible, such as ones containing spaces or + ampersands. </para> <para> - Quoting an identifier also makes it case-sensitive, whereas - unquoted names are always folded to lower case. For example, the - identifiers <literal>FOO</literal>, <literal>foo</literal>, and + Quoting an identifier makes it case-sensitive, whereas unquoted names + are always folded to lower case. For example, the identifiers + <literal>FOO</literal>, <literal>foo</literal>, and <literal>"foo"</literal> are considered the same by - <productname>PostgreSQL</productname>, but - <literal>"Foo"</literal> and <literal>"FOO"</literal> are - different from these three and each other. (The folding of - unquoted names to lower case in <productname>PostgreSQL</productname> is - incompatible with the SQL standard, which says that unquoted names - should be folded to upper case. Thus, <literal>foo</literal> - should be equivalent to <literal>"FOO"</literal> not - <literal>"foo"</literal> according to the standard. If you want - to write portable applications you are advised to always quote a - particular name or never quote it.) + <productname>PostgreSQL</productname>, but <literal>"Foo"</literal> + and <literal>"FOO"</literal> are not considered the same. + </para> + <para> + The folding of unquoted names to lower case in is incompatible with the + SQL standard. Thus, <literal>foo</literal> should be equivalent to + <literal>"FOO"</literal> not <literal>"foo"</literal> according to the + standard. If you want to write portable applications you are advised to + always quote a particular name or never quote it. </para> <indexterm> @@ -212,20 +192,16 @@ UPDATE "my_table" SET "a" = 5; </indexterm> <para> - A variant of quoted - identifiers allows including escaped Unicode characters identified - by their code points. This variant starts - with <literal>U&</literal> (upper or lower case U followed by + A variant of quoted identifiers allows including escaped Unicode + characters identified by their code points. This variant starts + with <literal>U&</literal> (upper or lower case U followed by ampersand) immediately before the opening double quote, without any spaces in between, for example <literal>U&"foo"</literal>. - (Note that this creates an ambiguity with the - operator <literal>&</literal>. Use spaces around the operator to - avoid this problem.) Inside the quotes, Unicode characters can be - specified in escaped form by writing a backslash followed by the - four-digit hexadecimal code point number or alternatively a - backslash followed by a plus sign followed by a six-digit - hexadecimal code point number. For example, the - identifier <literal>"data"</literal> could be written as + Inside the quotes, Unicode characters can be specified in escaped + form by writing a backslash followed by the four-digit hexadecimal + code point number or alternatively a backslash followed by a plus + sign followed by a six-digit hexadecimal code point number. For + example, the identifier <literal>"data"</literal> could be written as <programlisting> U&"d\0061t\+000061" </programlisting> @@ -330,9 +306,8 @@ SELECT 'foobar'; <programlisting> SELECT 'foo' 'bar'; </programlisting> - is not valid syntax. (This slightly bizarre behavior is specified - by <acronym>SQL</acronym>; <productname>PostgreSQL</productname> is - following the standard.) + is not valid syntax. This behavior is specified by + the <acronym>SQL</acronym> standard. </para> </sect3> @@ -351,9 +326,9 @@ SELECT 'foo' 'bar'; string constants, which are an extension to the SQL standard. An escape string constant is specified by writing the letter <literal>E</literal> (upper or lower case) just before the opening single - quote, e.g., <literal>E'foo'</literal>. (When continuing an escape string + quote, e.g., <literal>E'foo'</literal>. When continuing an escape string constant across lines, write <literal>E</literal> only before the first opening - quote.) + quote. Within an escape string, a backslash character (<literal>\</literal>) begins a C-like <firstterm>backslash escape</firstterm> sequence, in which the combination of backslash and following character(s) represent a special byte @@ -432,37 +407,9 @@ SELECT 'foo' 'bar'; It is your responsibility that the byte sequences you create, especially when using the octal or hexadecimal escapes, compose valid characters in the server character set encoding. - A useful alternative is to use Unicode escapes or the - alternative Unicode escape syntax, explained - in <xref linkend="sql-syntax-strings-uescape"/>; then the server + If you can use Unicode escapes or the alternative Unicode escape syntax, + explained in <xref linkend="sql-syntax-strings-uescape"/>; then the server will check that the character conversion is possible. - </para> - - <caution> - <para> - If the configuration parameter - <xref linkend="guc-standard-conforming-strings"/> is <literal>off</literal>, - then <productname>PostgreSQL</productname> recognizes backslash escapes - in both regular and escape string constants. However, as of - <productname>PostgreSQL</productname> 9.1, the default is <literal>on</literal>, meaning - that backslash escapes are recognized only in escape string constants. - This behavior is more standards-compliant, but might break applications - which rely on the historical behavior, where backslash escapes - were always recognized. As a workaround, you can set this parameter - to <literal>off</literal>, but it is better to migrate away from using backslash - escapes. If you need to use a backslash escape to represent a special - character, write the string constant with an <literal>E</literal>. - </para> - - <para> - In addition to <varname>standard_conforming_strings</varname>, the configuration - parameters <xref linkend="guc-escape-string-warning"/> and - <xref linkend="guc-backslash-quote"/> govern treatment of backslashes - in string constants. - </para> - </caution> - - <para> The character with the code zero cannot be in a string constant. </para> </sect3> @@ -482,9 +429,9 @@ SELECT 'foo' 'bar'; constant starts with <literal>U&</literal> (upper or lower case letter U followed by ampersand) immediately before the opening quote, without any spaces in between, for - example <literal>U&'foo'</literal>. (Note that this creates an + example <literal>U&'foo'</literal>. This creates an ambiguity with the operator <literal>&</literal>. Use spaces - around the operator to avoid this problem.) Inside the quotes, + around the operator to avoid this problem. Inside the quotes, Unicode characters can be specified in escaped form by writing a backslash followed by the four-digit hexadecimal code point number or alternatively a backslash followed by a plus sign @@ -493,7 +440,7 @@ SELECT 'foo' 'bar'; <programlisting> U&'d\0061t\+000061' </programlisting> - The following less trivial example writes the Russian + The following example writes the Russian word <quote>slon</quote> (elephant) in Cyrillic letters: <programlisting> U&'\0441\043B\043E\043D' @@ -532,17 +479,6 @@ U&'d!0061t!+000061' UESCAPE '!' by one of these escape sequences is converted to the actual server encoding; an error is reported if that's not possible. </para> - - <para> - Also, the Unicode escape syntax for string constants only works - when the configuration - parameter <xref linkend="guc-standard-conforming-strings"/> is - turned on. This is because otherwise this syntax could confuse - clients that parse the SQL statements to the point that it could - lead to SQL injections and similar security issues. If the - parameter is set to off, this syntax will be rejected with an - error message. - </para> </sect3> <sect3 id="sql-syntax-dollar-quoting"> @@ -555,24 +491,22 @@ U&'d!0061t!+000061' UESCAPE '!' <para> While the standard syntax for specifying string constants is usually convenient, it can be difficult to understand when the desired string - contains many single quotes, since each of those must - be doubled. To allow more readable queries in such situations, - <productname>PostgreSQL</productname> provides another way, called - <quote>dollar quoting</quote>, to write string constants. - A dollar-quoted string constant - consists of a dollar sign (<literal>$</literal>), an optional - <quote>tag</quote> of zero or more characters, another dollar - sign, an arbitrary sequence of characters that makes up the - string content, a dollar sign, the same tag that began this - dollar quote, and a dollar sign. For example, here are two - different ways to specify the string <quote>Dianne's horse</quote> - using dollar quoting: + contains many single quotes or backslashes. To allow more readable + queries in such situations, <productname>PostgreSQL</productname> + provides <quote>dollar quoting</quote> to write string constants. + A dollar-quoted string constant consists of a dollar sign + (<literal>$</literal>), an optional <quote>tag</quote> of zero or + more characters, another dollar sign, an arbitrary sequence of + characters that makes up the string content, a dollar sign, the same + tag that began this dollar quote, and a dollar sign. For example, here + are two different ways to specify the string + <quote>Dianne's horse</quote> using dollar quoting: <programlisting> $$Dianne's horse$$ $SomeTag$Dianne's horse$SomeTag$ </programlisting> Notice that inside the dollar-quoted string, single quotes can be - used without needing to be escaped. Indeed, no characters inside + used without needing to be escaped. No characters inside a dollar-quoted string are ever escaped: the string content is always written literally. Backslashes are not special, and neither are dollar signs, unless they are part of a sequence matching the opening @@ -590,13 +524,12 @@ BEGIN END; $function$ </programlisting> - Here, the sequence <literal>$q$[\t\r\n\v\\]$q$</literal> represents a + The sequence <literal>$q$[\t\r\n\v\\]$q$</literal> represents a dollar-quoted literal string <literal>[\t\r\n\v\\]</literal>, which will be recognized when the function body is executed by - <productname>PostgreSQL</productname>. But since the sequence does not match + <productname>PostgreSQL</productname>. Since the sequence does not match the outer dollar quoting delimiter <literal>$function$</literal>, it is - just some more characters within the constant so far as the outer - string is concerned. + just some more characters within the constant. </para> <para> @@ -634,7 +567,7 @@ $function$ </indexterm> <para> - Bit-string constants look like regular string constants with a + Bit-string constants is a string constant with a <literal>B</literal> (upper or lower case) immediately before the opening quote (no intervening whitespace), e.g., <literal>B'1001'</literal>. The only characters allowed within @@ -654,6 +587,7 @@ $function$ across lines in the same way as regular string constants. Dollar quoting cannot be used in a bit-string constant. </para> + </sect3> <sect3 id="sql-syntax-constants-numeric"> @@ -665,7 +599,7 @@ $function$ </indexterm> <para> - Numeric constants are accepted in these general forms: + Numeric constants are accepted in these forms: <synopsis> <replaceable>digits</replaceable> <replaceable>digits</replaceable>.<optional><replaceable>digits</replaceable></optional><optional>e<optional>+-</optional><replaceable>digits</replaceable></optional> @@ -678,13 +612,12 @@ $function$ exponent marker (<literal>e</literal>), if one is present. There cannot be any spaces or other characters embedded in the constant, except for underscores, which can be used for visual grouping as - described below. Note that any leading plus or minus sign is not actually - considered part of the constant; it is an operator applied to the - constant. + described below. Any leading plus or minus sign is not considered part of + the constant; it is an operator applied to the constant. </para> <para> - These are some examples of valid numeric constants: + Examples of valid numeric constants: <literallayout> 42 3.5 @@ -711,7 +644,7 @@ $function$ </para> <para> - These are some examples of valid non-decimal integer constants: + Examples of valid non-decimal integer constants: <literallayout> 0b100101 0B10011001 @@ -724,7 +657,7 @@ $function$ <para> For visual grouping, underscores can be inserted between digits. These - have no further effect on the value of the constant. For example: + have no effect on the value of the constant. For example: <literallayout> 1_500_000_000 0b10001000_00000000 @@ -743,7 +676,7 @@ $function$ <indexterm><primary>bigint</primary></indexterm> <indexterm><primary>numeric</primary></indexterm> A numeric constant that contains neither a decimal point nor an - exponent is initially presumed to be type <type>integer</type> if its + exponent is presumed to be type <type>integer</type> if its value fits in type <type>integer</type> (32 bits); otherwise it is presumed to be type <type>bigint</type> if its value fits in type <type>bigint</type> (64 bits); otherwise it is @@ -767,8 +700,8 @@ REAL '1.23' -- string style 1.23::REAL -- PostgreSQL (historical) style </programlisting> - These are actually just special cases of the general casting - notations discussed next. + These are special cases of the general casting notations discussed + next. </para> </sect3> @@ -863,8 +796,9 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> ) <listitem> <para> - A multiple-character operator name cannot end in <literal>+</literal> or <literal>-</literal>, - unless the name also contains at least one of these characters: + A multiple-character operator name cannot end in <literal>+</literal> + or <literal>-</literal>, unless the name also contains at least one + of these characters: <literallayout> ~ ! @ # % ^ & | ` ? </literallayout> @@ -878,7 +812,7 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> ) </para> <para> - When working with non-SQL-standard operator names, you will usually + When working with non-SQL-standard operator names, you will need to separate adjacent operators with spaces to avoid ambiguity. For example, if you have defined a prefix operator named <literal>@</literal>, you cannot write <literal>X*@Y</literal>; you must write @@ -911,7 +845,7 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> ) <listitem> <para> - Parentheses (<literal>()</literal>) have their usual meaning to + Parentheses (<literal>()</literal>) are used to group expressions and enforce precedence. In some cases parentheses are required as part of the fixed syntax of a particular SQL command. @@ -1003,7 +937,7 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> ) <para> A comment is removed from the input stream before further syntax - analysis and is effectively replaced by whitespace. + analysis and is replaced by whitespace. </para> </sect2> @@ -1143,12 +1077,11 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> ) </table> <para> - Note that the operator precedence rules also apply to user-defined + Operator precedence rules also apply to user-defined operators that have the same names as the built-in operators - mentioned above. For example, if you define a - <quote>+</quote> operator for some custom data type it will have - the same precedence as the built-in <quote>+</quote> operator, no - matter what yours does. + mentioned above. If you define a <quote>+</quote> operator + for some custom data type it will have the same precedence as + the built-in <quote>+</quote> operator, no matter what yours does. </para> <para> @@ -1163,23 +1096,6 @@ SELECT 3 OPERATOR(pg_catalog.+) 4; which specific operator appears inside <literal>OPERATOR()</literal>. </para> - <note> - <para> - <productname>PostgreSQL</productname> versions before 9.5 used slightly different - operator precedence rules. In particular, <token><=</token> - <token>>=</token> and <token><></token> used to be treated as - generic operators; <literal>IS</literal> tests used to have higher priority; - and <literal>NOT BETWEEN</literal> and related constructs acted inconsistently, - being taken in some cases as having the precedence of <literal>NOT</literal> - rather than <literal>BETWEEN</literal>. These rules were changed for better - compliance with the SQL standard and to reduce confusion from - inconsistent treatment of logically equivalent constructs. In most - cases, these changes will result in no behavioral change, or perhaps - in <quote>no such operator</quote> failures which can be resolved by adding - parentheses. However there are corner cases in which a query might - change behavior without any parsing error being reported. - </para> - </note> </sect2> </sect1>