Re: Writing utf 8 files
If you open the file as utf-8 you will see "ö" and if you open it as MacRoman you will see "√∂". You could also open it as Traditional Chinese or Simplified Chinese or many other things and see other things. UTF-8 byte order is always the same, so there is no need for a BOM, though some editors might use it as a hint. Given that his editor seems to have interpreted the file as utf-8 with the BOM in place and as something else without the BOM, we might guess that his editor recognizes the BOM. We could also, of course, guess that his login account is set to default to something other than utf-8, which is also in keeping with my experience with Mac OS X when the user has not deliberately messed around with things.
Re: Writing utf 8 files
At 9:28 pm +0200 22/6/06, Tommy Nordgren wrote: On Jun 22, 2006, at 1:48 PM, Tommy Nordgren wrote: How do I write proper utf 8 characters to a file? I write only two characters, and they come out as four garbage characters when I view the file in an editor. The only reason for that can be that you have your editor set to open files as MacRoman or some non-utf-8 charset. Provided your editor prefs are set to open as utf-8 or you opt for utf-8 in the open file dialog you will not get this problem. I found the problem it is necessary to 1) use the use utf8 pragma; 2) Explicitly write a BOM byte sequence immediately after opening the file. point 2 is where I erred. I expected the BOM to be added automatically, when opening a file for write with the utf-8 encoding. You would need to give an example of what you are doing, but neither of those things should be necessary and nor should it be necessary to specify utf-8 when opening the filehandle as Sherm suggested. The following script will write "ö", utf8-encoded to "trash.txt" on the desktop: #!/usr/bin/perl my $text = "ö"; my $f = "$ENV{HOME}/desktop/trash.txt"; open F, ">$f" or die $!; print F $text; close F; If you open the file as utf-8 you will see "ö" and if you open it as MacRoman you will see "√∂". You could also open it as Traditional Chinese or Simplified Chinese or many other things and see other things. UTF-8 byte order is always the same, so there is no need for a BOM, though some editors might use it as a hint. JD
Re: Writing utf 8 files
On Jun 22, 2006, at 3:28 PM, Tommy Nordgren wrote: 22 jun 2006 kl. 20.29 skrev Tommy Nordgren: 22 jun 2006 kl. 20.15 skrev Sherm Pendley: On Jun 22, 2006, at 1:48 PM, Tommy Nordgren wrote: How do I write proper utf 8 characters to a file? I write only two characters, and they come out as four garbage characters when I view the file in an editor. Quick answer: open FH, ">:utf8", "file"; Complete answer: perldoc perluniintro perldoc PerlIO I've already tried that. That was what i was doing when I got garbage. I found the problem it is necessary to 1) use the use utf8 pragma; That's only needed if your actual Perl code is UTF-8 encoded, like my example was. If your UTF-8 data is coming from an external source, "use utf8" has no effect. sherm-- Cocoa programming in Perl: http://camelbones.sourceforge.net Hire me! My resume: http://www.dot-app.org
Re: Writing utf 8 files
22 jun 2006 kl. 20.29 skrev Tommy Nordgren: 22 jun 2006 kl. 20.15 skrev Sherm Pendley: On Jun 22, 2006, at 1:48 PM, Tommy Nordgren wrote: How do I write proper utf 8 characters to a file? I write only two characters, and they come out as four garbage characters when I view the file in an editor. Quick answer: open FH, ">:utf8", "file"; Complete answer: perldoc perluniintro perldoc PerlIO sherm-- Cocoa programming in Perl: http://camelbones.sourceforge.net Hire me! My resume: http://www.dot-app.org I've already tried that. That was what i was doing when I got garbage. I found the problem it is necessary to 1) use the use utf8 pragma; 2) Explicitly write a BOM byte sequence immediately after opening the file. point 2 is where I erred. I expected the BOM to be added automatically, when opening a file for write with the utf-8 encoding. - This sig is dedicated to the advancement of Nuclear Power Tommy Nordgren [EMAIL PROTECTED]
Re: Writing utf 8 files
On Jun 22, 2006, at 2:29 PM, Tommy Nordgren wrote: 22 jun 2006 kl. 20.15 skrev Sherm Pendley: On Jun 22, 2006, at 1:48 PM, Tommy Nordgren wrote: How do I write proper utf 8 characters to a file? I write only two characters, and they come out as four garbage characters when I view the file in an editor. Quick answer: open FH, ">:utf8", "file"; Complete answer: perldoc perluniintro perldoc PerlIO I've already tried that. That was what i was doing when I got garbage. Well, the above is correct as far as Perl goes - but it doesn't rule out other problems. Are you certain that the editor you're using is interpreting the file correctly, as UTF8? Also, are you certain that your input really is UTF8? For instance, I ran this script to generate a test file: #!/usr/bin/perl use strict; use warnings; use utf8; # This allows utf8 in string literals, like below open FH, '>:utf8', '/Users/sherm/hello.txt' or die $!; print FH "Hëllö, wörld!\n"; close FH; When I open the file in BBEdit, I see gibberish, because BBEdit can't determine that it's UTF8 (there's no BOM), and misinterprets it as the default Mac OS Roman instead. But, if I change BBEdit's default encoding, or use the "Reopen Using Encoding" function, BBEdit displays the file correctly. sherm-- Cocoa programming in Perl: http://camelbones.sourceforge.net Hire me! My resume: http://www.dot-app.org
Re: Writing utf 8 files
22 jun 2006 kl. 20.15 skrev Sherm Pendley: On Jun 22, 2006, at 1:48 PM, Tommy Nordgren wrote: How do I write proper utf 8 characters to a file? I write only two characters, and they come out as four garbage characters when I view the file in an editor. Quick answer: open FH, ">:utf8", "file"; Complete answer: perldoc perluniintro perldoc PerlIO sherm-- Cocoa programming in Perl: http://camelbones.sourceforge.net Hire me! My resume: http://www.dot-app.org I've already tried that. That was what i was doing when I got garbage. - This sig is dedicated to the advancement of Nuclear Power Tommy Nordgren [EMAIL PROTECTED]
Re: Writing utf 8 files
On Jun 22, 2006, at 1:48 PM, Tommy Nordgren wrote: How do I write proper utf 8 characters to a file? I write only two characters, and they come out as four garbage characters when I view the file in an editor. Quick answer: open FH, ">:utf8", "file"; Complete answer: perldoc perluniintro perldoc PerlIO sherm-- Cocoa programming in Perl: http://camelbones.sourceforge.net Hire me! My resume: http://www.dot-app.org
Writing utf 8 files
How do I write proper utf 8 characters to a file? I write only two characters, and they come out as four garbage characters when I view the file in an editor. - This sig is dedicated to the advancement of Nuclear Power Tommy Nordgren [EMAIL PROTECTED]