"Jonathan S. Katz" <jk...@postgresql.org> writes: > On 9/17/19 6:40 PM, Tom Lane wrote: >> After a re-read of the XQuery spec, it seems to me that the character >> entry form that they have and we don't is actually "&#NNNN;" like >> HTML, rather than just "#NN". Can anyone double-check that?
> Clicking through the XQuery spec eventual got me to here[1] (which warns > me that its out of date, but that is what its "current" specs linked me > to), which describes being able to use "&#[0-9]+;" and "&#[0-9a-fA-F]+;" > to specify characters (which I recognize as a character escape from > HTML, XML et al.). After further reading, it seems like what that text is talking about is not actually a regex feature, but an outgrowth of the fact that the regex pattern is being expressed as a string literal in a language for which XML character entities are a native aspect of the string literal syntax. So it looks to me like the entities get folded to raw characters in a string-literal parser before the regex engine ever sees them. As such, I think this doesn't apply to SQL/JSON. The SQL/JSON spec seems to defer to Javascript/ECMAscript for syntax details, and in either of those languages you have backslash escape sequences for writing weird characters, *not* XML entities. You certainly wouldn't have use of such entities in a native implementation of LIKE_REGEX in SQL. So now I'm thinking we can just remove the handwaving about entities. On the other hand, this points up a large gap in our docs about SQL/JSON, which is that nowhere does it even address the question of what the string literal syntax is within a path expression. Much less point out that that syntax is nothing like native SQL strings. Good luck finding out from the docs that you'd better double any backslashes you'd like to have in your regex --- but a moment's testing proves that that is the case in our code as it stands. Have we misread the spec badly enough to get this wrong? regards, tom lane