At 9:28 pm +0200 22/6/06, Tommy Nordgren wrote:

On Jun 22, 2006, at 1:48 PM, Tommy Nordgren wrote:

How do I write proper utf 8 characters to a file? I write only two characters, and they come out as four
garbage characters when I view the file in an editor.

The only reason for that can be that you have your editor set to open files as MacRoman or some non-utf-8 charset. Provided your editor prefs are set to open as utf-8 or you opt for utf-8 in the open file dialog you will not get this problem.


        I found the problem it is necessary to
1) use the use utf8 pragma;
2) Explicitly write a BOM byte sequence immediately after opening the file.
point 2 is where I erred. I expected the BOM to be added automatically,
when opening a file for write with the utf-8 encoding.

You would need to give an example of what you are doing, but neither of those things should be necessary and nor should it be necessary to specify utf-8 when opening the filehandle as Sherm suggested.

The following script will write "ö", utf8-encoded to "trash.txt" on the desktop:

#!/usr/bin/perl
my $text = "ö";
my $f = "$ENV{HOME}/desktop/trash.txt";
open F, ">$f" or die $!;
print F $text;
close F;

If you open the file as utf-8 you will see "ö" and if you open it as MacRoman you will see "√∂". You could also open it as Traditional Chinese or Simplified Chinese or many other things and see other things. UTF-8 byte order is always the same, so there is no need for a BOM, though some editors might use it as a hint.

JD

Reply via email to