Re: [HACKERS] Unicode string literals versus the world

Tom Lane Tue, 14 Apr 2009 07:32:30 -0700

Peter Eisentraut <pete...@gmx.net> writes:
> On Saturday 11 April 2009 00:54:25 Tom Lane wrote:
>> If we let this go into 8.4, our previous rounds with security holes
>> caused by careless string parsing will look like a day at the beach.


> Note that the escape character marks the Unicode escapes; it doesn't
> affect the quote characters that delimit the string.  So offhand I
> can't see any potential for quote confusion/SQL injection type
> problems.  Please elaborate if you see a problem.

The problem is the interaction with non-standard-conforming strings.

Case 1:

        select u&'foo\' uescape ',' ...

The backend will see the backslash as just a data character, and
will think that "..." is live SQL text.  A non-Unicode-literal-aware
frontend will think that the backslash escapes the second quote, the
comma is live SQL text, and the ... is quoted material.  Construction
of an actual SQL injection attack is left as an exercise for the reader,
but certainly the raw material is here.

Case 2:

        select u&'foo' uescape '\' ...

Again, any existing frontend code will think that the backslash quotes
the final quote and the ... is quoted material.  This one is
particularly nasty because we allow arbitrary amounts of whitespace and
commenting on either side of "uescape":


        select u&'foo' 
        /* hello joe, do you /* understand nested comments today?
        */
        -- yes, this one too */
        uescape
        -- but not this one /*
        '\' ...

I suspect that it's actually impossible to parse such a thing correctly
without a full-fledged flex lexer or something of equivalent complexity.
Certainly it's a couple of orders of magnitude harder than it is for
either standard-conforming or E'' literals.

Case 3:

        select u&'foo\' uescape ',' ...

        select u & 'foo\' uescape ',' ...

In the first form the ... is live SQL, in the second form it is quoted
material.  This means that you might correctly validate a query and then
have your results invalidated by later processing that innocently adds
or removes whitespace.  (This is particularly nasty in a standard that
demands we parse "x/-1" and "x / -1" the same ...)

So what we've got here is a whole new set of potential SQL injection
attacks by confusing frontend literal-syntax checking, plus a truly
staggering increase in the minimum *required* complexity of such
checking.  I understand the usefulness of being able to write Unicode
code points, but they're not useful enough to justify this syntax.

This thread has already mentioned a couple of ways we could add the
same facility without making any lexical-level changes, at least for
data values.  I admit that the SQL:2008 way also covers Unicode code
points in identifiers, which we can't emulate without a lexical change;
but frankly I think the use-case for that is so thin as to be almost
nonexistent.  Who is going to choose identifiers that they cannot easily
type on their keyboards?

                        regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unicode string literals versus the world

Reply via email to