Hi David,

On 2024-03-16 19:39 +0100, David E. Wheeler wrote:
> The jsonpath doc[1] has an excellent description of the format of
> strings, but for unquoted path keys, it simply says:
> 
> > Member accessor that returns an object member with the specified
> > key. If the key name matches some named variable starting with $ or
> > does not meet the JavaScript rules for an identifier, it must be
> > enclosed in double quotes to make it a string literal.
> 
> I went looking for the JavaScript rules for an identifier and found
> this in the MDN docs[2]:
> 
> > In JavaScript, identifiers can contain Unicode letters, $, _, and
> > digits (0-9), but may not start with a digit. An identifier differs
> > from a string in that a string is data, while an identifier is part
> > of the code. In JavaScript, there is no way to convert identifiers
> > to strings, but sometimes it is possible to parse strings into
> > identifiers.
> 
> 
> However, the Postgres parsing of jsonpath keys appears to follow the
> same rules as strings, allowing backslash escapes:
> 
> david=# select '$.fo\u00f8 == $x'::jsonpath;
>      jsonpath       -------------------
>  ($."foø" == $"x")
> 
> This would seem to contradict the documentation. Is this behavior
> required by the SQL standard? Do the docs need updating? Or should the
> code actually follow the JSON identifier behavior?

That quoted MDN page does not give the whole picture.  ECMAScript and JS
do allow Unicode escape sequences in identifier names:

https://262.ecma-international.org/#sec-identifier-names
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Lexical_grammar#identifiers

> PS: Those excellent docs on strings mentions support for \v, but the
> grammar in the right nav of https://www.json.org/json-en.html does
> not. Another bonus feature?

You refer to that sentence: "Other special backslash sequences include
those recognized in JSON strings: \b, \f, \n, \r, \t, \v for various
ASCII control characters, and \uNNNN for a Unicode character identified
by its 4-hex-digit code point."

Mentioning JSON and \v in the same sentence is wrong: JavaScript allows
that escape in strings but JSON doesn't.  I think the easiest is to just
replace "JSON" with "JavaScript" in that sentence to make it right.  The
paragraph also already says "embedded string literals follow JavaScript/
ECMAScript conventions", so mentioning JSON seems unnecessary to me.

The last sentence also mentions backslash escapes \xNN and \u{N...} as
deviations from JSON when in fact those are valid escape sequences from
ECMA-262: https://262.ecma-international.org/#prod-HexEscapeSequence
So I think it makes sense to reword the entire backslash part of the
paragraph and remove references to JSON entirely.  The attached patch
does that and also formats the backslash escapes as a bulleted list for
readability.

> [1]: https://www.postgresql.org/docs/16/datatype-json.html#DATATYPE-JSONPATH
> [2]: https://developer.mozilla.org/en-US/docs/Glossary/Identifier

On 2024-03-16 21:33 +0100, David E. Wheeler wrote:
> On Mar 16, 2024, at 14:39, David E. Wheeler <da...@justatheory.com>
> wrote:
> 
> > I went looking for the JavaScript rules for an identifier and found
> > this in the MDN docs[2]:
> > 
> >> In JavaScript, identifiers can contain Unicode letters, $, _, and
> >> digits (0-9), but may not start with a digit. An identifier differs
> >> from a string in that a string is data, while an identifier is part
> >> of the code. In JavaScript, there is no way to convert identifiers
> >> to strings, but sometimes it is possible to parse strings into
> >> identifiers.
> 
> Coda: Dollar signs don’t work at all outside double-quoted string
> identifiers:
> 
> david=# select '$.$foo'::jsonpath;
> ERROR:  syntax error at or near "$foo" of jsonpath input
> LINE 1: select '$.$foo'::jsonpath;
>                ^
> 
> david=# select '$.f$oo'::jsonpath;
> ERROR:  syntax error at or near "$oo" of jsonpath input
> LINE 1: select '$.f$oo'::jsonpath;
>                ^
> 
> david=# select '$."$foo"'::jsonpath;
>  jsonpath 
> ----------
>  $."$foo"
> 
> This, too, contradicts the MDM definition an identifier (and some
> quick browser tests).

The first case ($.$foo) is in line with the restriction on member
accessors that you quoted first.

The error message 'syntax error at or near "$oo" of jsonpath input' for
the second case ($.f$oo), however, looks as if the scanner identifies
'$oo' as a variable instead of contiuing the scan of identifier (f$oo)
for the member accessor.  Looks like a bug to me because a variable
doesn't even make sense in that place.

What works though, besides double quoting, is escaping the dollar sign:

        regress=# select '$.\u0024foo'::jsonpath;
         jsonpath
        ----------
         $."$foo"
        (1 row)

And we've come full circle :)

-- 
Erik
>From a2bade71867aecbea90c7c03f0295cecca0c215d Mon Sep 17 00:00:00 2001
From: Erik Wienhold <e...@ewie.name>
Date: Sun, 17 Mar 2024 19:28:07 +0100
Subject: [PATCH v1] Simplify docs on backslash escapes in jsonpath

The paragraph describing the JavaScript string literals allowed in
jsonpath expressions is not ideal: it unnecessarily mentions JSON by
erroneously listing \v as allowed by JSON and mentioning the \xNN and
\u{N...} backslash escapes as deviations from JSON when in fact both are
accepted by ECMAScript/JavaScript.  Fix this by only referring to
JavaScript and with a bulleted list of backslash escapes to make it more
readable.
---
 doc/src/sgml/json.sgml | 47 ++++++++++++++++++++++++++++--------------
 1 file changed, 32 insertions(+), 15 deletions(-)

diff --git a/doc/src/sgml/json.sgml b/doc/src/sgml/json.sgml
index 1dbb9606e9..1ec4b90abb 100644
--- a/doc/src/sgml/json.sgml
+++ b/doc/src/sgml/json.sgml
@@ -803,21 +803,38 @@ UPDATE table_name SET jsonb_field[1]['a'] = '1';
    In particular, the way to write a double quote within an embedded string
    literal is <literal>\"</literal>, and to write a backslash itself, you
    must write <literal>\\</literal>.  Other special backslash sequences
-   include those recognized in JSON strings:
-   <literal>\b</literal>,
-   <literal>\f</literal>,
-   <literal>\n</literal>,
-   <literal>\r</literal>,
-   <literal>\t</literal>,
-   <literal>\v</literal>
-   for various ASCII control characters, and
-   <literal>\u<replaceable>NNNN</replaceable></literal> for a Unicode
-   character identified by its 4-hex-digit code point.  The backslash
-   syntax also includes two cases not allowed by JSON:
-   <literal>\x<replaceable>NN</replaceable></literal> for a character code
-   written with only two hex digits, and
-   <literal>\u{<replaceable>N...</replaceable>}</literal> for a character
-   code written with 1 to 6 hex digits.
+   include those recognized in JavaScript strings:
+   <itemizedlist>
+    <listitem>
+     <para>
+      <literal>\b</literal>,
+      <literal>\f</literal>,
+      <literal>\n</literal>,
+      <literal>\r</literal>,
+      <literal>\t</literal>,
+      <literal>\v</literal>
+      for various ASCII control characters
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      <literal>\x<replaceable>NN</replaceable></literal>
+      for a character code written with 2 hex digits
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      <literal>\u<replaceable>NNNN</replaceable></literal>
+      for a Unicode code point written with 4 hex digits
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      <literal>\u{<replaceable>N...</replaceable>}</literal>
+      for a Unicode code point written with 1 to 6 hex digits
+     </para>
+    </listitem>
+   </itemizedlist>
   </para>
 
   <para>
-- 
2.44.0

Reply via email to