-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160
I've been wondering how best to solve the standard_conforming_strings
(SCS) problem. Here's some quick background:
In version of 8.2 of Postgres, this concept was introduced. As the
docs say:
"This controls whether ordinary string literals ('...') treat
backslashes literally, as specified in the SQL standard. ...
The default will change to on in a future release to improve
compatibility with the standard."
In other words, 'foo\\bar' in 8.2 with SCS off is foo\x08ar,
while 'foo\\bar' with SCS on is two backslashes, then the letter
b.
Basically, DBD::PG wll be simply using the E'' format to keep the
actual things inside the single quotes the way they are now. I think
trying to fully support the standard by allowing things like 'foo\\bar'
to differ depending on context would cause too much confusion. Knowing
when to switch from one form to another ('' to E'') will be very
tricky, however.
We can query upon connection if standard_conforming_strings is on.
If it is on, we return quoted values as E'' instead of ''. However,
a user may change standard_conforming_strings at any time, even
temporarily within a transaction, which means that if we are going
to change the E <=> '' on the fly, we need to check the SCS constantly.
Luckily, libpq already tracks changes to this variable and stores
it internally, so the lookup is very cheap. Unfortunately, this
is only available in newer versions of libpq, so we've got the old
game of juggling different versions of libpq and the target database.
Here's some possible solutions, just to throw out there for thought:
1) Check SCS on startup, and if on, use E'' from that point onward.
2) Check SCS via libpq before every quote call. If libpq is too old
to know about SCS, check ourself via a SHOW call.
3) Check SCS via libpq before every quote call. If libpq is too old
to know about SCS, do the same logic as #1
4) Have a switch that allows it to be toggled between #1 and #2/#3.
5) If the server version supports E'' (e.g. >= 8.2), use that all
the time, period.
6) Do #5, but allow a switch to turn it off or do some other behavior.
#5 is probably the "safest", but also the least backwards compatible
for anything generating quotes to use on an older version, and may
very well break applications that aren't expecting the 'E'.
I'm leaning towards #1 at the moment, but want to get some other
minds to look at the problem. Also imagine a future world in which
this is on by default and people routinely use backslashes to
mean backslashes - should that weigh in our decision now? Finally,
remember that long term we want to support custom types and custom
quoting, so relying on the libpq quoting is probably not wise.
On rereading this, I think I have a fairly decent solution, but
I'd like to see other people reach it independently, so I won't
share it quite yet. :)
- --
Greg Sabino Mullane [EMAIL PROTECTED]
End Point Corporation
PGP Key: 0x14964AC8 200805062309
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----
iEYEAREDAAYFAkghHp0ACgkQvJuQZxSWSsiZqgCg10s0o5s6ptq5aLd+DkzNKQun
ihIAoOKbCt/Sq6Ub4KQYWrZoHT9h3vea
=W/nc
-----END PGP SIGNATURE-----