-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160
> Uh, say what? Just as I need to > > binmode STDOUT, ':utf8'; > Before sending stuff to STDOUT (that is, turn off the flag), I would > expect DBDs to do the same before sending data to the database. > Unless, of course, it "just works". I cannot imagine the flag really matters or not. We (Pg) simply dump a bunch of chars to the database, and build it by slurping in the string character by character until we hit a null. I suppose other databases may do things differently, but I can't imagine how/why. >> Yes, very bad example. Let's call it utf8. Forget 'unicode' entirely. > Yeah, better, though it' just perpetuates Perl's unfortunate use of > the term "utf8" for "internal string representation." Though I suppose > that ship has sunk already. Yep. To paraphrase horribly, "Perl's unicode support is the worst, except for all the other languages". >> Because it may still need to convert things. See the ODBC discussion. > > Oh, so you're saying it will decode and encode between Perl's internal > form and UTF-8, rather than just flip the flag on and off? Yes, that's a possibility. > Yes, because you were only talking about utf8 and UTF-8, not any > other encodings. Unless I missed something. If the data coming back > from the DB is Big5, I may well want to have some way to decode it > (and to encode it for write statements). You mean at the DBD level - such that you can say to the database, I don't care what encoding you stored it as, I want it encoded as X when you give it back to me? (update: yes, see below) >> Well, because utf-8 is pretty much a defacto encoding, or at least >> way, way more popular than things like ucs2. Also, the Perl utf8 >> flag encourages us to put everything into UTF-8. > > Yeah, but again, that might be some reason to call it something else, > like "perl_native" or something. The fact that it happens to be UTF-8 > should be irrelevant. ER, except, I guess, you still have to know the > encoding of the database. Well, I wouldn't call it irrelevant, but at the end of the day, we can call it perl_native, but that's just going to cause people to look it up in the docs and then say "aha! that means the utf8 flag is on" and then they have "perl_native -> utf8" burned into their head. Or worse, "perl_native -> unicode". :) >> * 'A': the default, it means the DBD should do the best thing, which in most >> cases means setting SvUTF8_on if the data coming back is UTF-8. >> * 'B': (on). The DBD should make every effort to set SvUTF8_on for returned >> data, even if it thinks it may not be UTF-8. >> * 'C': (off). The DBD should not call SvUTF8_on, regardless of what it >> thinks the data is. > I still prefer an encoding attribute that you can set as follows: > * undef: Default; same as your A. > * ':utf8': Same as your B: > * ':raw': Same as your C > * $encoding: Encode/decode to/from $encoding I like that. Although the names are still odd. I guess it does map though: raw means no utf8 flag. Still not sure about the encode 'to', but I'll start thinking about how we could implement the 'from' in DBD::Pg. How would one map things - just demand that whatever is given must be a literal encoding the particular database can understand? > With an encoding attribute, you don't need the utf8_flag at all. Right, +1 So the above means these two actually behave very differently: $dbh->{encoding} = ':utf8'; $dbh->{encoding} = 'utf8'; Could be a little confusing, no? Methinks we some long ugly name, maybe even worse than "perl_native". Perhaps "perl_internal_utf8_flag"? 1/2 :) Thanks for plugging away at this. My short term goal is to get this finalized enough that I can release the next version of DBD::Pg without a 'pg_' prefix to control the encoding items. - -- Greg Sabino Mullane g...@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201110061151 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAk6Nz28ACgkQvJuQZxSWSsiWJQCgt/F0r/sCPDa9GuYrGZpZHlQ2 WfYAn0asIYHmPKz1BDfcBo7wLADHmH7N =eJmk -----END PGP SIGNATURE-----