Re: BOM and principle of least surprise

Erland Sommarskog Sun, 16 May 2004 04:12:43 -0700

Jarkko Hietaniemi ([EMAIL PROTECTED]) writes:
>> Both input data and the script. Just because the script has been saved
>> in UTF-8, does not mean that literals in the script are taken as UTF-8.
> 
> Oh, great.  Now you want to mix different encodings in the same file.
> I give up :-)


I think you misunderstood me. This script was in my original post:

   use strict;
   
   use MSSQL::OlleDB;
   $| = 1;
   my $i = 0;
   foreach (1..2) {
      my $db = 'räksmörgås'; 
      print "Len " . length($db) . " Str: $db\n";
      my $X = MSSQL::OlleDB->connect(undef, undef, undef, $db);
      $i++;
      print "$i\n" if $i % 50 == 0
   }
 
This script is supposed to connect to a database called "räksmörgås", 
a name which in SQL Server is stored as Unicode, in UTF-16. OlleDB is
my XS module, and it uses SvUTF8 to determin whether $db is in UTF-8
or not, and then converts to UTF-16 from the ANSI code page or UTF-8.

First I had saved the script in ANSI format, and I connected as I had
expected. Then I saved the script in UTF-8. It still said "räksmörgås"
when I looked at the file, but SvUTF8 still returned false, so I did
not connect to the database successfully.

>> To be able to that, it would have have to understand byte-order marks
>> (which it doesn't). I think there was a suggestion that you could
>> specify an 
>
>In 5.8.5 it will.

Will such an option include the possibility to say that I want Perl to
determine the encoding from the byte-order mark?

-- 
Erland Sommarskog, Stockholm, [EMAIL PROTECTED]

Re: BOM and principle of least surprise

Reply via email to