On 20.05.20 23:56, Thomas Munro wrote:
On Thu, May 21, 2020 at 1:37 AM PG Doc comments form
<nore...@postgresql.org> wrote:
The following documentation comment has been logged on the website:
Page: https://www.postgresql.org/docs/12/tutorial-join.html
Description:
The tutorial about joins makes the following statement about the explicit
JOIN operator:
This syntax is not as commonly used as the one above
I think in 2020 this claim is no longer true, and I would love to see the
manual prefer the "modern" explicit JOIN operator rather than sticking to
the ancient implicit joins in the WHERE clause.
+1
The "new" syntax is 28 years old, from SQL 92. I don't see too many
SQL 86 joins. Would you like to write a documentation patch?
The attached patch
- prefers the explicit join-syntax over the implicit one and explains
the keywords of the explicit syntax
- uses a more accurate definition of 'join'
- separates <programlisting> and <screen> tags
- shifts <indexterm> definitions outside of <para> to get a better
rendering in PDF
- adds a note concerning IDs and foreign keys
--
J. Purtz
diff --git a/doc/src/sgml/query.sgml b/doc/src/sgml/query.sgml
index c0889743c4..6f8ea373b5 100644
--- a/doc/src/sgml/query.sgml
+++ b/doc/src/sgml/query.sgml
@@ -441,11 +441,11 @@ SELECT DISTINCT city
<para>
Thus far, our queries have only accessed one table at a time.
Queries can access multiple tables at once, or access the same
- table in such a way that multiple rows of the table are being
- processed at the same time. A query that accesses multiple rows
- of the same or different tables at one time is called a
- <firstterm>join</firstterm> query. As an example, say you wish to
- list all the weather records together with the location of the
+ table several times. Such queries — they are called
+ <firstterm>join</firstterm> queries — combine
+ rows of one table in some way with rows of the other table
+ and return a single row per combination. An example may be a
+ list of all the weather records together with the location of the
associated city. To do that, we need to compare the <structfield>city</structfield>
column of each row of the <structname>weather</structname> table with the
<structfield>name</structfield> column of all rows in the <structname>cities</structname>
@@ -461,10 +461,17 @@ SELECT DISTINCT city
<programlisting>
SELECT *
- FROM weather, cities
- WHERE city = name;
+FROM weather
+JOIN cities ON (city = name);
</programlisting>
+ The keyword <command>JOIN</command> connects the two tables.
+ Behind the keyword <command>ON</command> follows the
+ definition how to compare their rows. In this case, the
+ column <varname>city</varname> of table <varname>weather</varname>
+ must be equal to the column <varname>name</varname>
+ of table <varname>cities</varname>.
+
<screen>
city | temp_lo | temp_hi | prcp | date | name | location
---------------+---------+---------+------+------------+---------------+-----------
@@ -476,8 +483,14 @@ SELECT *
</para>
<para>
- Observe two things about the result set:
+ Observe some things about the result set:
<itemizedlist>
+ <listitem>
+ <para>
+ The resulting rows contain values from both tables.
+ </para>
+ </listitem>
+
<listitem>
<para>
There is no result row for the city of Hayward. This is
@@ -499,8 +512,8 @@ SELECT *
<literal>*</literal>:
<programlisting>
SELECT city, temp_lo, temp_hi, prcp, date, location
- FROM weather, cities
- WHERE city = name;
+FROM weather
+JOIN cities ON (city = name);
</programlisting>
</para>
</listitem>
@@ -509,10 +522,9 @@ SELECT city, temp_lo, temp_hi, prcp, date, location
<formalpara>
<title>Exercise:</title>
-
<para>
Attempt to determine the semantics of this query when the
- <literal>WHERE</literal> clause is omitted.
+ <literal>ON</literal> clause is omitted.
</para>
</formalpara>
@@ -526,8 +538,8 @@ SELECT city, temp_lo, temp_hi, prcp, date, location
<programlisting>
SELECT weather.city, weather.temp_lo, weather.temp_hi,
weather.prcp, weather.date, cities.location
- FROM weather, cities
- WHERE cities.name = weather.city;
+FROM weather
+JOIN cities ON (cities.name = weather.city);
</programlisting>
It is widely considered good style to qualify all column names
@@ -537,20 +549,22 @@ SELECT weather.city, weather.temp_lo, weather.temp_hi,
<para>
Join queries of the kind seen thus far can also be written in this
- alternative form:
+ form:
<programlisting>
SELECT *
- FROM weather INNER JOIN cities ON (weather.city = cities.name);
+FROM weather, cities
+WHERE weather.city = cities.name;
</programlisting>
- This syntax is not as commonly used as the one above, but we show
- it here to help you understand the following topics.
+ This syntax is not as commonly used as the one above. It dates back
+ to the very first times of SQL, avoids the <literal>JOIN</literal>
+ keyword and uses the <literal>WHERE</literal> clause instead of the
+ <literal>ON</literal> clause.
</para>
+ <indexterm><primary>join</primary><secondary>outer</secondary></indexterm>
<para>
- <indexterm><primary>join</primary><secondary>outer</secondary></indexterm>
-
Now we will figure out how we can get the Hayward records back in.
What we want the query to do is to scan the
<structname>weather</structname> table and for each row to find the
@@ -563,15 +577,19 @@ SELECT *
<programlisting>
SELECT *
- FROM weather LEFT OUTER JOIN cities ON (weather.city = cities.name);
+FROM weather
+LEFT OUTER JOIN cities ON (weather.city = cities.name);
+</programlisting>
+
+<screen>
city | temp_lo | temp_hi | prcp | date | name | location
---------------+---------+---------+------+------------+---------------+-----------
Hayward | 37 | 54 | | 1994-11-29 | |
San Francisco | 46 | 50 | 0.25 | 1994-11-27 | San Francisco | (-194,53)
San Francisco | 43 | 57 | 0 | 1994-11-29 | San Francisco | (-194,53)
(3 rows)
-</programlisting>
+</screen>
This query is called a <firstterm>left outer
join</firstterm> because the table mentioned on the left of the
@@ -591,10 +609,9 @@ SELECT *
</para>
</formalpara>
+ <indexterm><primary>join</primary><secondary>self</secondary></indexterm>
+ <indexterm><primary>alias</primary><secondary>for table name in query</secondary></indexterm>
<para>
- <indexterm><primary>join</primary><secondary>self</secondary></indexterm>
- <indexterm><primary>alias</primary><secondary>for table name in query</secondary></indexterm>
-
We can also join a table against itself. This is called a
<firstterm>self join</firstterm>. As an example, suppose we wish
to find all the weather records that are in the temperature range
@@ -607,30 +624,42 @@ SELECT *
following query:
<programlisting>
-SELECT W1.city, W1.temp_lo AS low, W1.temp_hi AS high,
- W2.city, W2.temp_lo AS low, W2.temp_hi AS high
- FROM weather W1, weather W2
- WHERE W1.temp_lo < W2.temp_lo
- AND W1.temp_hi > W2.temp_hi;
+SELECT w1.city, w1.temp_lo AS low, w1.temp_hi AS high,
+ w2.city, w2.temp_lo AS low, w2.temp_hi AS high
+FROM weather w1
+JOIN weather w2 ON (w1.temp_lo < w2.temp_lo AND
+ w1.temp_hi > w2.temp_hi);
+</programlisting>
+<screen>
city | low | high | city | low | high
---------------+-----+------+---------------+-----+------
San Francisco | 43 | 57 | San Francisco | 46 | 50
Hayward | 37 | 54 | San Francisco | 46 | 50
(2 rows)
-</programlisting>
+</screen>
- Here we have relabeled the weather table as <literal>W1</literal> and
- <literal>W2</literal> to be able to distinguish the left and right side
+ Here we have relabeled the weather table as <literal>w1</literal> and
+ <literal>w2</literal> to be able to distinguish the left and right side
of the join. You can also use these kinds of aliases in other
queries to save some typing, e.g.:
<programlisting>
SELECT *
- FROM weather w, cities c
- WHERE w.city = c.name;
+FROM weather w
+JOIN cities c ON (w.city = c.name);
</programlisting>
You will encounter this style of abbreviating quite frequently.
</para>
+
+ <note>
+ <para>
+ The examples shown here combine rows via city names.
+ This should help to understand the concept. Professional
+ solutions prefer to use numerical IDs and foreign keys
+ to join tables.
+ </para>
+ </note>
+
</sect1>