On Jul 17, 2011, at 11:11 AM, Greg Sabino Mullane wrote:
> Well, it will set it to UTF-8, unless there is a really good reason not to.
> And the only exceptions are SQL_ASCII and if they went out of their way to
> set the client encoding themselves, in which case it would be rude of us
> to change it back on them. :)
Okay, put that way I understand it. I think that should be the introductory
paragraph, followed by a bulleted list explaining the situations in which it
would be off.
>>> Better than? This is in addition to the above, to be clear. This is
>>> basically a shortcut for someone setting pg_unicode false and issuing
>>> a "SET client_encoding = 'foo'".
>
>> Unless I set it to "utf8", in which case pg_unicode would be true and
>> client_encoding would be set to "UTF-8". Right?
>
> Right. Although in most cases that will be a no-op as those will already
> be set that way. Although a weak case could be argued that setting it
> to UTF-8 via the interface should turn pg_unicodde *off*, to be consistent.
> But I think that's all the more reason for a separate knob, and one of the
> reasons I'm only lukewarm to the whole $h->{encoding} thing.
I think that setting pg_encoding should always turn pg_unicode *on*.
>> From the user's perspective, I think it makes much more sense. It says,
>> "Here is what I want the encoding to be," which is easier to understand
>> than "Should we or should we not convert the incoming data to Perl's
>> internal form." Most people won't know WTF that means.
>
> Yeah, that's true. On the other hand, even the encoding setting is meant
> as sort of an expert knob.
Maybe. I think a lot of existing installations may find they need to turn it
off, unless they had been using pg_enable_utf8 before.
>>> We still need a flag to know if we are unicoding or not. We cannot tell
>>> just
>>> from a stored client_encoding.
>
>> Why not? That's what pg_unicode was figuring out on its own if you didn't
>> set it.
>
> Yes, but once we call $h->{encoding}, we need to track both the encoding and
> the fact that we are decoding or not. Which could be either way. Which raises
> a point: if we need a way to get things back to "normal" after the user
> sets $h->{encoding} to something weird, presumably they would then call
> $h->{encoding} = UTF-8. So perhaps that answers the above: we turn pg_unicode
> *on* in that case. But it still means that there is no way for someone to
> want a UTF-8 client_encoding but do NOT want us to decode things. Sigh.
I think that setting pg_encoding should turn on pg_unicode, unless it's set to
:raw or something. Then someone could always explicitly set both to make it do
what they mean.
> (some more of the same arguments trimmed from your reply)
Yeah, sorry. :-)
>>> Or at the very least, we have separate flags for incoming and outgoing
>>> tweaking.
>
>> Oy. Let's not go there yet.
>
> How about now? :) The problem is that people have existing scripts that we
> don't
> want to fail, and are trying to shove who-knows-what into the database, so we
> definitely want to clean up their mess as it comes in, but give them the
> option
> not to mess with it in case that is what they need. I think that should be a
> separate
> knob from the stuff coming back from the database. To put another way, I'm
> happy
> linking the two together for most things but providing an expert knob just in
> case
> they need it that can de-couple them.
Oh I agree, I just think it's worth putting off until this other stuff gets
sorted out.
> I'm trying to make this as bulletproof as possible so that we break as few
> existing
> scripts as possible on the first release, and allow as much fine-tuning as
> needed
> from the get-go, since we cannot know what will really break or the strange
> combinations
> people will want until this is released in the wild.
The truth is, unless we pay attention to what pg_enable_utf8 was set to in such
scripts -- and if it was set -- then suddenly having stuff be encoded and
decoded when it wasn't before may surprise some folks. It *shouldn't*, but it
will be different than what it was doing before.
Have you asked Tim Bunce about any of this stuff? I know he has thought about
adding encoding knobs to the DBI core, but I don't know how far a long he got
in thinking about a design.
Best,
David