Tom Lane wrote:

After sleeping on it, I do think that tying the mechanism to newlines
is just unnecessary complication.  I'm currently leaning to an idea that
was suggested yesterday by (I think) Andreas: let the quote start marker
be a token of the form
        dollarsign zero-or-more-letters dollarsign
and let the quote body extend to the next occurrence of the identical
string.  For example
        ... $Q$Joe's house$Q$ ...
is equivalent to
        ... 'Joe''s house' ...

This is extremely compact for quoting strings that don't contain any
doubled dollar signs, since you don't need any letters at all.  I could
see $$text$$ becoming a very common way to quote material that contains
single quotes or backslashes.  But since you can choose any string of
letters to make up the terminating token, the mechanism is able to quote
any text whatever, including nested occurrences of the same structure
(with a different letterstring of course).

Note that there is no particular need to insist on any nearby newlines.
If the construct is written just following an identifier or keyword,
then you do need some intervening whitespace to keep the $Q$ from being
read as part of that identifier, but I doubt this will bother anyone.

Note that I'm allowing only letters, not digits, in the string; this
avoids any possible ambiguity with $n parameter tokens.  We have no
other SQL tokens that are allowed to start with $, so this creates no
other lexical ambiguity.

Comments?



I like it. It is really quite similar to perl's q$text$ mechanism, but making allowances for the fact we are in a multi-language environment.

I presume the delimiter will never be kept, but eaten by the lexer. I'd like to see pg_dump use this mechanism for quoting, at least for function bodies. I guess it could retrieve the text and then keep generating delimiters until it found one that didn't occur inside the text. Maybe for that purpose we could allow underscores as well as letters - I don't think that should introduce any extra ambiguities. Alternatively, or as well, maybe leading and trailing digits could be disallowed, but embedded digits could be allowed. IOW let's be as liberal as possible without breaking things.

cheers

andrew


---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster

Reply via email to