-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

Thanks David W. My replies below.

>> We will utilize a connect attribute (enabled by default) to enable the
>> use of an immediate SET client_encoding.  The current name of this is
>> "pg_utf8_strings", but DWC prefers something non-encoding specific;
>> examples wanted, but "pg_unicode" or "pg_internal" seem best.

> pg_decode_strings. Or pg_encode_strings, depending on how you look at it.

Yeah, I'm opposed to "pg_internal". I'm okay with the others, but I 
don't feel anyone has stumbled upon a name that "feels" right yet.
I'll use 'pg_unicode' for the rest of this email.

> +If the "pg_internal" attribute is explicitly provided in the DBI
> +connect attributes it will be one of (0, 1), to enable/disable the
> +pg_internal behavior explicitly.  If not provided, we check the
> +initial "server_encoding" and "client_encoding" settings.
> +
> +The logic for setting "pg_internal" when unspecified is:
> +
> + - If "server_encoding" is "SQL_ASCII" set pg_internal to 0.
> +
> + - If "client_encoding" <> "server_encoding", or perhaps better yet if
> +   the pg_setting("client_encoding") returns a different value than
> +   the default version for that setting, then we assuming that the
> +   client encoding choice is *explicit* and the user will be wanting
> +   to get raw octets back from DBI, thus set pg_internal to 0.

> I find this description confusing. What is the default value for that 
> setting? 
> I mean, how can one know that?

There is no default: it's computed on the fly at connection time, based 
on the server_encoding and the client_encoding. As the client_encoding 
defaults to the server_encoding, the only way it can be different is 
in the rare case that someone has set it inside of postgresql.conf. In 
which case, we respect that and don't do any transformations at all.

> But we strongly recommend you set it explicitly to avoid confusion. And 
> really, setting it to 1 is strongly recommended for proper and transparent 
> handling of multibyte characters.

Yes, or some wording along the lines of "this is an expert knob, and you really 
ought to leave it alone unless you really know what you are doing".

> +DWC suggested a DBD::db attribute handle, suggested to be called
> +"encoding" which when set would effectively pass-thru to the
> +underlying: "SET client_encoding = $blah" and *disable* the
> +pg_internal flag.  Specifically, by setting the encoding attribute,
> +you are effectively indicating that you want the data from PostgreSQL
> +back

> I like this *so* much better.

Better than? This is in addition to the above, to be clear. This is 
basically a shortcut for someone setting pg_unicode false and issuing 
a "SET client_encoding = 'foo'". I'm still on the fence about making 
such a shortcut into a formal call. The advantage is that it removes 
the case where someone sets client_encoding manually but forgets to 
switch pg_unicode off.

> +If such a mechanism *was* instituted, we could utilize `pg_encoding =>
> +'blah'` as the connection-level attribute and just tie the underlying
> +implementation of the pg_internal mechanism to this, by having a
> +keyword ('internal') as the special-case encoding, which could be
> +enabled/disabled via $dbh->{pg_encoding} = 'internal';

> WTF is internal?

I'm not sure what David C is saying above, to be honest.

> Seems to me that with pg_encoding you don't need pg_internal at all. You 
> just have a default value for pg_encoding, which would be:
>
> * If "client_encoding" is not set to its default value, DBD::Pg assumes that 
> the choice is explicit, so use that.
> * Else if "server_encoding" is "SQL_ASCII" set pg_encoding to "SQL_ASCII".
> * Else use "utf-8".

We still need a flag to know if we are unicoding or not. We cannot tell just 
from a stored client_encoding.

> +Behavior changes if pg_internal is set
> +--------------------------------------

> Or if pg_encoding eq 'utf-8'.

No: what if someone changes the encoding later? In that case, we do *not* 
want to unicodalize (yep, making up words left and right here) the strings 
coming back from the database.

> +When processing the result sets returned by the server, if pg_internal
> +is set, we can either fiat that the "client_encoding" is set to UTF-8
> +as it was originally when we switched it on connection, or verify that
> +the libpq's result set charset/encoding is equal to UTF-8.  I believe
> +this is available as an int, which could be cached when we do the
> +original "SET client_encoding" and/or initial setup tests, which
> +should prevent accidental corruption.
>
> Or just strongly recommend that if you want to change it, set pg_encoding 
> instead of executing SET CLIENT_ENCODING.

Yeah: I'm not keen on checking the client_encoding every single time we 
get a resultset back from the server, no matter how cheap the result. 
As David W implies, people should use the encoding interface of suffer 
the consequences.

> + - if pg_internal is 1 and incoming SV's UTF8 flag is 1, we
> +   do nothing; the underlying (char*) will already be in utf-8 data.

> Maybe. utf8 ne UTF-8, quite.

Right, but it is the best we can do.

> +  - treat as latin-1/perl raw.  This may be a good default choice,
> +    but I'm not 100% convinced; in any case we would need to
> +    convert from raw to utf-8 using utf8::upgrade.

> I think this is basically what Perl assumes, so it's probably pretty 
> safe. It would also be the reasonable thing to do if pg_encoding 
> is set to something other than utf-8: you assume the user knows what 
> she's doing and passing things in the proper encoding.

Agree with the first, but not with the second: once the user sets pg_encoding, 
we stop messing with their data, both incoming and outgoing, in the expectation 
that they have entered expert mode and want to handle things themselves. 
Or at the very least, we have separate flags for incoming and outgoing tweaking.

> +       a) switch client_encoding for query to the original
> +          client_encoding, while somehow still retaining the utf-8
> +          client encoding for result set retrieval, or,

I can't see this one working out.

> +DWC feels strongly that we should avoid setting the SvUTF8 flag on any
> +retrieved/created SV which does not require it;

GSM feels just as strongly we should set it on everything.

- -- 
Greg Sabino Mullane [email protected]
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201107140921
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8



-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAk4e7aMACgkQvJuQZxSWSshvSACcCYAF22e4lEYPDPyPbd0XhoAi
kyMAoK7Z/rHE1wAMBybAf/PTcSWS7tiK
=4NAD
-----END PGP SIGNATURE-----


Reply via email to