Peter Eisentraut <pete...@gmx.net> writes: > On Saturday 11 April 2009 00:54:25 Tom Lane wrote: >> If we let this go into 8.4, our previous rounds with security holes >> caused by careless string parsing will look like a day at the beach.
> Note that the escape character marks the Unicode escapes; it doesn't > affect the quote characters that delimit the string. So offhand I > can't see any potential for quote confusion/SQL injection type > problems. Please elaborate if you see a problem. The problem is the interaction with non-standard-conforming strings. Case 1: select u&'foo\' uescape ',' ... The backend will see the backslash as just a data character, and will think that "..." is live SQL text. A non-Unicode-literal-aware frontend will think that the backslash escapes the second quote, the comma is live SQL text, and the ... is quoted material. Construction of an actual SQL injection attack is left as an exercise for the reader, but certainly the raw material is here. Case 2: select u&'foo' uescape '\' ... Again, any existing frontend code will think that the backslash quotes the final quote and the ... is quoted material. This one is particularly nasty because we allow arbitrary amounts of whitespace and commenting on either side of "uescape": select u&'foo' /* hello joe, do you /* understand nested comments today? */ -- yes, this one too */ uescape -- but not this one /* '\' ... I suspect that it's actually impossible to parse such a thing correctly without a full-fledged flex lexer or something of equivalent complexity. Certainly it's a couple of orders of magnitude harder than it is for either standard-conforming or E'' literals. Case 3: select u&'foo\' uescape ',' ... select u & 'foo\' uescape ',' ... In the first form the ... is live SQL, in the second form it is quoted material. This means that you might correctly validate a query and then have your results invalidated by later processing that innocently adds or removes whitespace. (This is particularly nasty in a standard that demands we parse "x/-1" and "x / -1" the same ...) So what we've got here is a whole new set of potential SQL injection attacks by confusing frontend literal-syntax checking, plus a truly staggering increase in the minimum *required* complexity of such checking. I understand the usefulness of being able to write Unicode code points, but they're not useful enough to justify this syntax. This thread has already mentioned a couple of ways we could add the same facility without making any lexical-level changes, at least for data values. I admit that the SQL:2008 way also covers Unicode code points in identifiers, which we can't emulate without a lexical change; but frankly I think the use-case for that is so thin as to be almost nonexistent. Who is going to choose identifiers that they cannot easily type on their keyboards? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers