-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160
> I think only text types and text-like types (Greg, how does DBD::Pg > determine this, currently? I'd want CITEXT data to be converted to > UTF-8, too; is there some way to tell it what types should be utf8?) As far as stuff coming out of the database, it's only the four text-like types I mentioned earlier. See line 3329 of dbdimp.c. We might want to make than an exclusion check, and/or go global as mentioned below. Now that I've had some time to recall things, I think the primary reason for not so much automagicness is simply a question of efficiency. Parsing every string coming out of the database for "utf-8ness" is expensive. Also expensive is checking client_encoding, although libpq at least tracks that for us, so it's not as bad as it first looks. So the next question is, why don't we just flip the utf8 flag on for all strings coming back from the database? What's the drawbacks? I need to brush up on my unicode foo, but let's keep the discussion going, I'd love to see this solved in a way that limits or removes the need for things like setting specific utf8 flags via the database handle. - -- Greg Sabino Mullane [EMAIL PROTECTED] End Point Corporation PGP Key: 0x14964AC8 200809091043 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAkjGi6YACgkQvJuQZxSWSsgrWwCdHt8l1pIyRTEqGv/vkvlKFodV qC4An0to3nstwKZYAC3aYVr2MdniWHxo =5AsA -----END PGP SIGNATURE-----
