At 9:28 pm +0200 22/6/06, Tommy Nordgren wrote:
On Jun 22, 2006, at 1:48 PM, Tommy Nordgren wrote:
How do I write proper utf 8 characters to a file? I write only
two characters, and they come out as four
garbage characters when I view the file in an editor.
The only reason for that can be that you have your editor set to open
files as MacRoman or some non-utf-8 charset. Provided your editor
prefs are set to open as utf-8 or you opt for utf-8 in the open file
dialog you will not get this problem.
I found the problem it is necessary to
1) use the use utf8 pragma;
2) Explicitly write a BOM byte sequence immediately after opening the file.
point 2 is where I erred. I expected the BOM to be added automatically,
when opening a file for write with the utf-8 encoding.
You would need to give an example of what you are doing, but neither
of those things should be necessary and nor should it be necessary to
specify utf-8 when opening the filehandle as Sherm suggested.
The following script will write "ö", utf8-encoded to "trash.txt" on
the desktop:
#!/usr/bin/perl
my $text = "ö";
my $f = "$ENV{HOME}/desktop/trash.txt";
open F, ">$f" or die $!;
print F $text;
close F;
If you open the file as utf-8 you will see "ö" and if you open it as
MacRoman you will see "ö". You could also open it as Traditional
Chinese or Simplified Chinese or many other things and see other
things. UTF-8 byte order is always the same, so there is no need for
a BOM, though some editors might use it as a hint.
JD