From 5cb4b90ab399395399ea9fe1556801000d316e21 Mon Sep 17 00:00:00 2001
From: "David E. Wheeler" <david@justatheory.com>
Date: Sat, 14 Oct 2023 16:42:58 -0400
Subject: [PATCH v5] Improve boolean predicate JSON Path docs

Following up from a suggestion from Tom Lane[1] to improve the
documentation of boolean predicate JSON path expressions, please find
enclosed a draft patch to do so. It does three things:

1. Converts all of the example path queries to use `jsonb_path_query()`
   and show the results, to make it clearer what the behaviors are.

2. Replaces the list of deviations from the standards with a new
   subsection, with each deviation in its own sub-subsection. The regex
   section is unchanged, but I've greatly expanded the boolean
   expression JSON path section with examples comparing standard filter
   expressions and nonstandard boolean predicates. I've also added an
   exhortation not use boolean expressions with @? or standard path
   expressions with @@.

3. While converting the modes section to use `jsonb_path_query()` and
   show the results, I also added an example of strict mode returning an
   error and a section demonstrating the difference in query results
   when using lax vs. strict mode.

4. Removes the notes about "only the first value" in the `@@` and
   `jsonb_path_match()` docs, and noted that they support only predicate
   JSON path expressions. Also noted that `json_path_exists()` supports
   only SQL standard path expressions --- in other words, *not*
   predicate path queries.
---
 doc/src/sgml/func.sgml | 323 +++++++++++++++++++++++++++++------------
 1 file changed, 229 insertions(+), 94 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7c3e940afe..a157b00a5d 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -15838,7 +15838,10 @@ table2-mapping
         <returnvalue>boolean</returnvalue>
        </para>
        <para>
-        Does JSON path return any item for the specified JSON value?
+        Does JSON path return any item for the specified JSON value? Use only
+        SQL-standard JSON path expressions, not
+        <link linkend="boolean-predicate-path-expressions">predicate check
+        expressions.</link>
        </para>
        <para>
         <literal>'{"a":[1,2,3,4,5]}'::jsonb @? '$.a[*] ? (@ > 2)'</literal>
@@ -15852,10 +15855,12 @@ table2-mapping
         <returnvalue>boolean</returnvalue>
        </para>
        <para>
-        Returns the result of a JSON path predicate check for the
-        specified JSON value.  Only the first item of the result is taken into
-        account.  If the result is not Boolean, then <literal>NULL</literal>
-        is returned.
+        Returns the result of a JSON path
+        <link linkend="boolean-predicate-path-expressions">predicate
+        check</link> for the specified JSON value. If the result is not Boolean,
+        then <literal>NULL</literal> is returned. Use only with
+        <link linkend="boolean-predicate-path-expressions">predicate check
+        expressions.</link>
        </para>
        <para>
         <literal>'{"a":[1,2,3,4,5]}'::jsonb @@ '$.a[*] > 2'</literal>
@@ -16926,7 +16931,9 @@ array w/o UK? | t
        </para>
        <para>
         Checks whether the JSON path returns any item for the specified JSON
-        value.
+        value. Use only SQL-standard JSON path expressions, not
+        <link linkend="boolean-predicate-path-expressions">predicate check
+        expressions.</link>
         If the <parameter>vars</parameter> argument is specified, it must
         be a JSON object, and its fields provide named values to be
         substituted into the <type>jsonpath</type> expression.
@@ -16949,12 +16956,14 @@ array w/o UK? | t
         <returnvalue>boolean</returnvalue>
        </para>
        <para>
-        Returns the result of a JSON path predicate check for the specified
-        JSON value.  Only the first item of the result is taken into account.
-        If the result is not Boolean, then <literal>NULL</literal> is returned.
-        The optional <parameter>vars</parameter>
-        and <parameter>silent</parameter> arguments act the same as
-        for <function>jsonb_path_exists</function>.
+        Returns the result of a JSON path
+        <link linkend="boolean-predicate-path-expressions">predicate
+        check</link> for the specified JSON value. If the result is not Boolean,
+        then <literal>NULL</literal> is returned. Use only with
+        <link linkend="boolean-predicate-path-expressions">predicate check
+        expressions.</link> The optional <parameter>vars</parameter> and
+        <parameter>silent</parameter> arguments act the same as for
+        <function>jsonb_path_exists</function>.
        </para>
        <para>
         <literal>jsonb_path_match('{"a":[1,2,3,4,5]}', 'exists($.a[*] ? (@ >= $min &amp;&amp; @ &lt;= $max))', '{"min":2, "max":4}')</literal>
@@ -16972,8 +16981,12 @@ array w/o UK? | t
        </para>
        <para>
         Returns all JSON items returned by the JSON path for the specified
-        JSON value.
-        The optional <parameter>vars</parameter>
+        JSON value. For SQL-standard JSON path expressions it returns the JSON
+        values selected from <parameter>target</parameter>. For
+        <link linkend="boolean-predicate-path-expressions">predicate check</link>
+        Path expressions it returns the result of the predicate check:
+        <literal>true</literal>, <literal>false</literal>, or
+        <literal>null</literal>. The optional <parameter>vars</parameter>
         and <parameter>silent</parameter> arguments act the same as
         for <function>jsonb_path_exists</function>.
        </para>
@@ -17203,9 +17216,12 @@ array w/o UK? | t
 
   <para>
    For example, suppose you have some JSON data from a GPS tracker that you
-   would like to parse, such as:
+   would like to parse, such as this JSON, set up as a
+   <link linkend="app-psql-meta-command-set"><application>psql</application>
+   <command>\set</command> variable</link> for use as <literal>:'json'</literal>
+   in the examples below:
 <programlisting>
-{
+ \set json '{
   "track": {
     "segments": [
       {
@@ -17220,7 +17236,7 @@ array w/o UK? | t
       }
     ]
   }
-}
+}'
 </programlisting>
   </para>
 
@@ -17228,9 +17244,13 @@ array w/o UK? | t
    To retrieve the available track segments, you need to use the
    <literal>.<replaceable>key</replaceable></literal> accessor
    operator to descend through surrounding JSON objects:
-<programlisting>
-$.track.segments
-</programlisting>
+<screen>
+<prompt>=&gt;</prompt> <userinput>select jsonb_path_query(:'json', '$.track.segments');</userinput>
+select jsonb_path_query(:'json', '$.track.segments');
+                                                                         jsonb_path_query
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ [{"HR": 73, "location": [47.763, 13.4034], "start time": "2018-10-14 10:05:14"}, {"HR": 135, "location": [47.706, 13.2635], "start time": "2018-10-14 10:39:21"}]
+</screen>
   </para>
 
   <para>
@@ -17238,18 +17258,25 @@ $.track.segments
    <literal>[*]</literal> operator. For example,
    the following path will return the location coordinates for all
    the available track segments:
-<programlisting>
-$.track.segments[*].location
-</programlisting>
+<screen>
+<prompt>=&gt;</prompt> <userinput>select jsonb_path_query(:'json', '$.track.segments[*].location');</userinput>
+ jsonb_path_query
+-------------------
+ [47.763, 13.4034]
+ [47.706, 13.2635]
+</screen>
   </para>
 
   <para>
    To return the coordinates of the first segment only, you can
    specify the corresponding subscript in the <literal>[]</literal>
    accessor operator. Recall that JSON array indexes are 0-relative:
-<programlisting>
-$.track.segments[0].location
-</programlisting>
+<screen>
+<prompt>=&gt;</prompt> <userinput>select jsonb_path_query(:'json', 'strict $.track.segments[0].location');</userinput>
+ jsonb_path_query
+-------------------
+ [47.763, 13.4034]
+</screen>
   </para>
 
   <para>
@@ -17258,9 +17285,12 @@ $.track.segments[0].location
    listed in <xref linkend="functions-sqljson-path-operators"/>.
    Each method name must be preceded by a dot. For example,
    you can get the size of an array:
-<programlisting>
-$.track.segments.size()
-</programlisting>
+<screen>
+<prompt>=&gt;</prompt> <userinput>select jsonb_path_query(:'json', 'strict $.track.segments.size()');</userinput>
+ jsonb_path_query
+------------------
+ 2
+</screen>
    More examples of using <type>jsonpath</type> operators
    and methods within path expressions appear below in
    <xref linkend="functions-sqljson-path-operators"/>.
@@ -17301,9 +17331,12 @@ $.track.segments.size()
   <para>
    For example, suppose you would like to retrieve all heart rate values higher
    than 130. You can achieve this using the following expression:
-<programlisting>
-$.track.segments[*].HR ? (@ &gt; 130)
-</programlisting>
+<screen>
+<prompt>=&gt;</prompt> <userinput>select jsonb_path_query(:'json', '$.track.segments[*].HR ? (@ &gt; 130)');</userinput>
+ jsonb_path_query
+------------------
+ 135
+</screen>
   </para>
 
   <para>
@@ -17311,65 +17344,108 @@ $.track.segments[*].HR ? (@ &gt; 130)
    filter out irrelevant segments before returning the start times, so the
    filter expression is applied to the previous step, and the path used
    in the condition is different:
-<programlisting>
-$.track.segments[*] ? (@.HR &gt; 130)."start time"
-</programlisting>
+<screen>
+<prompt>=&gt;</prompt> <userinput>select jsonb_path_query(:'json', '$.track.segments[*] ? (@.HR &gt; 130)."start time"');</userinput>
+   jsonb_path_query
+-----------------------
+ "2018-10-14 10:39:21"
+</screen>
   </para>
 
   <para>
    You can use several filter expressions in sequence, if required. For
    example, the following expression selects start times of all segments that
    contain locations with relevant coordinates and high heart rate values:
-<programlisting>
-$.track.segments[*] ? (@.location[1] &lt; 13.4) ? (@.HR &gt; 130)."start time"
-</programlisting>
+<screen>
+<prompt>=&gt;</prompt> <userinput>select jsonb_path_query(:'json', '$.track.segments[*] ? (@.location[1] &lt; 13.4) ? (@.HR &gt; 130)."start time"');</userinput>
+   jsonb_path_query
+-----------------------
+ "2018-10-14 10:39:21"
+</screen>
   </para>
 
   <para>
    Using filter expressions at different nesting levels is also allowed.
    The following example first filters all segments by location, and then
    returns high heart rate values for these segments, if available:
-<programlisting>
-$.track.segments[*] ? (@.location[1] &lt; 13.4).HR ? (@ &gt; 130)
-</programlisting>
+<screen>
+<prompt>=&gt;</prompt> <userinput>select jsonb_path_query(:'json', '$.track.segments[*] ? (@.location[1] &lt; 13.4).HR ? (@ &gt; 130)');</userinput>
+ jsonb_path_query
+------------------
+ 135
+</screen>
   </para>
 
   <para>
    You can also nest filter expressions within each other:
-<programlisting>
-$.track ? (exists(@.segments[*] ? (@.HR &gt; 130))).segments.size()
-</programlisting>
+<screen>
+<prompt>=&gt;</prompt> <userinput>select jsonb_path_query(:'json', '$.track ? (exists(@.segments[*] ? (@.HR &gt; 130))).segments.size()');</userinput>
+ jsonb_path_query
+------------------
+ 2
+</screen>
    This expression returns the size of the track if it contains any
    segments with high heart rate values, or an empty sequence otherwise.
   </para>
 
-  <para>
-   <productname>PostgreSQL</productname>'s implementation of the SQL/JSON path
-   language has the following deviations from the SQL/JSON standard:
-  </para>
+  <sect3 id="deviations-from-the-standard">
+  <title>Deviations from the SQL Standard</title>
+   <para>
+    <productname>PostgreSQL</productname>'s implementation of the SQL/JSON path
+    language has the following deviations from the SQL/JSON standard.
+   </para>
 
-  <itemizedlist>
-   <listitem>
+   <sect4 id="boolean-predicate-path-expressions">
+   <title>Boolean Predicate Path Expressions</title>
     <para>
-     A path expression can be a Boolean predicate, although the SQL/JSON
-     standard allows predicates only in filters.  This is necessary for
-     implementation of the <literal>@@</literal> operator. For example,
-     the following <type>jsonpath</type> expression is valid in
-     <productname>PostgreSQL</productname>:
-<programlisting>
-$.track.segments[*].HR &lt; 70
-</programlisting>
-    </para>
-   </listitem>
+     As an extension to the SQL standard, a <productname>PostgreSQL</productname>
+     path expression can be a Boolean predicate, whereas the SQL standard allows
+     predicates only in filters. Where SQL standard path expressions return the
+     relevant contents of the queried JSON value, predicate path expressions
+     return the three-valued result of the predicate: <literal>true</literal>,
+     <literal>false</literal>, or <literal>unknown</literal>. Compare this
+     filter <type>jsonpath</type> expression:
+<screen>
+<prompt>=&gt;</prompt> <userinput>select jsonb_path_query(:'json', '$.track.segments ?(@[*].HR &gt; 130)');</userinput>
+                                jsonb_path_query
+---------------------------------------------------------------------------------
+ {"HR": 135, "location": [47.706, 13.2635], "start time": "2018-10-14 10:39:21"}
+</screen>
+     To a predicate expression, which returns <literal>true</literal>
+<screen>
+<prompt>=&gt;</prompt> <userinput>select jsonb_path_query(:'json', '$.track.segments[*].HR &gt; 130');</userinput>
+ jsonb_path_query
+------------------
+ true
+</screen>
+     </para>
 
-   <listitem>
-    <para>
-     There are minor differences in the interpretation of regular
-     expression patterns used in <literal>like_regex</literal> filters, as
-     described in <xref linkend="jsonpath-regular-expressions"/>.
-    </para>
-   </listitem>
-  </itemizedlist>
+     <note>
+      <para>
+       Predicate-only path expressions are necessary for implementation of the
+       <literal>@@</literal> operator (and the
+       <function>jsonb_path_match</function> function), and should not be used
+       with the <literal>@?</literal> operator (or
+       <function>jsonb_path_exists</function> function).
+      </para>
+
+      <para>
+       Conversely, non-predicate <type>jsonpath</type> expressions should not be
+       used with the <literal>@@</literal> operator (or the
+       <function>jsonb_path_match</function> function).
+      </para>
+     </note>
+    </sect4>
+
+    <sect4 id="jsonpath-regular-expression-deviation">
+    <title>Regular Expression Interpretation</title>
+     <para>
+      There are minor differences in the interpretation of regular
+      expression patterns used in <literal>like_regex</literal> filters, as
+      described in <xref linkend="jsonpath-regular-expressions"/>.
+     </para>
+    </sect4>
+   </sect3>
 
    <sect3 id="strict-and-lax-modes">
    <title>Strict and Lax Modes</title>
@@ -17430,40 +17506,99 @@ $.track.segments[*].HR &lt; 70
     For example, when querying the GPS data listed above, you can
     abstract from the fact that it stores an array of segments
     when using the lax mode:
-<programlisting>
-lax $.track.segments.location
-</programlisting>
+<screen>
+<prompt>=&gt;</prompt> <userinput>select jsonb_path_query(:'json', 'lax $.track.segments.location');</userinput>
+ jsonb_path_query
+-------------------
+ [47.763, 13.4034]
+ [47.706, 13.2635]
+</screen>
    </para>
 
    <para>
-    In the strict mode, the specified path must exactly match the structure of
+    In strict mode, the specified path must exactly match the structure of
     the queried JSON document to return an SQL/JSON item, so using this
-    path expression will cause an error. To get the same result as in
-    the lax mode, you have to explicitly unwrap the
+    path expression will cause an error:
+<screen>
+<prompt>=&gt;</prompt> <userinput>select jsonb_path_query(:'json', 'strict $.track.segments.location');</userinput>
+ERROR:  jsonpath member accessor can only be applied to an object
+</screen>
+    To get the same result as in the lax mode, you have to explicitly unwrap the
     <literal>segments</literal> array:
-<programlisting>
-strict $.track.segments[*].location
-</programlisting>
+<screen>
+<prompt>=&gt;</prompt> <userinput>select jsonb_path_query(:'json', 'strict $.track.segments[*].location');</userinput>
+ jsonb_path_query
+-------------------
+ [47.763, 13.4034]
+ [47.706, 13.2635]
+</screen>
    </para>
 
-   <para>
-    The <literal>.**</literal> accessor can lead to surprising results
-    when using the lax mode. For instance, the following query selects every
-    <literal>HR</literal> value twice:
-<programlisting>
-lax $.**.HR
-</programlisting>
-    This happens because the <literal>.**</literal> accessor selects both
-    the <literal>segments</literal> array and each of its elements, while
-    the <literal>.HR</literal> accessor automatically unwraps arrays when
-    using the lax mode. To avoid surprising results, we recommend using
-    the <literal>.**</literal> accessor only in the strict mode. The
-    following query selects each <literal>HR</literal> value just once:
-<programlisting>
-strict $.**.HR
-</programlisting>
-   </para>
+   <sect4 id="lax-mode-unwrapping-results">
+   <title>Suprising Results from Unwrapping</title>
+    <para>
+     The unwrapping behavior of lax mode can lead to surprising results. For
+     instance, the following query using the <literal>.**</literal> accessor
+     selects every <literal>HR</literal> value twice:
+ <screen>
+ <prompt>=&gt;</prompt> <userinput>select jsonb_path_query(:'json', 'lax $.**.HR');</userinput>
+  jsonb_path_query
+ ------------------
+  73
+  135
+  73
+  135
+ </screen>
+     This happens because the <literal>.**</literal> accessor selects both
+     the <literal>segments</literal> array and each of its elements, while
+     the <literal>.HR</literal> accessor automatically unwraps arrays when
+     using the lax mode. To avoid surprising results, we recommend using
+     the <literal>.**</literal> accessor only in the strict mode. The
+     following query selects each <literal>HR</literal> value just once:
+ <screen>
+ <prompt>=&gt;</prompt> <userinput>select jsonb_path_query(:'json', 'strict $.**.HR');</userinput>
+  jsonb_path_query
+ ------------------
+  73
+  135
+ </screen>
+    </para>
+
+    <para>
+     The unwrapping of arrays can also lead to unexpected results. Consider this
+     example, which selects all the <literal>location</literal> arrays:
+<screen>
+<prompt>=&gt;</prompt> <userinput>select jsonb_path_query(:'json', 'lax $.track.segments[*].location');</userinput>
+ jsonb_path_query
+-------------------
+ [47.763, 13.4034]
+ [47.706, 13.2635]
+(2 rows)
+</screen>
+     As expected it returns the full arrays. But applying a filter expression
+     causes the arrays to be unwrapped to evaluate each item, returning only the
+     items that match the expression:
+<screen>
+<prompt>=&gt;</prompt> <userinput>select jsonb_path_query(:'json', 'lax $.track.segments[*].location ?(@[*] &gt; 15)');</userinput>
+ jsonb_path_query
+------------------
+ 47.763
+ 47.706
+(2 rows)
+</screen>
+     This despite the fact that the full arrays are selected by the path
+     expression. Use strict mode to restore selecting the arrays:
+<screen>
+<prompt>=&gt;</prompt> <userinput>select jsonb_path_query(:'json', 'strict $.track.segments[*].location ?(@[*] &gt; 15)');</userinput>
+ jsonb_path_query
+-------------------
+ [47.763, 13.4034]
+ [47.706, 13.2635]
+(2 rows)
+</screen>
+    </para>
 
+   </sect4>
    </sect3>
 
    <sect3 id="functions-sqljson-path-operators">
-- 
2.42.0

