Rather than going through the somewhat buggy process of trying to
determine which of the
many character sets there are, is there some way that I can just
universally convert everything
into UTF8?
I can open a file with a :utf8 declaration when creating the file
handle. But do I need to do this on a utf8 file or will perl just
"know". If it doesn't, can I just open everything in utf8 mode and
not lose any data?
On May 12, 2007, at 5:04 AM, Dr.Ruud wrote:
Tom Allison schreef:
Under perl version 5.8, does /(\w+)/ match UTF-8 characters without
calling any special pragma?
Yes, but only if your data is proper. Mind that any ASCII-character
is a
UTF-8 character too (U+0000 .. U+007F).
So I'm trying to see if I can just use /(\w+)/ without worrying about
all this character encoding?
Only if your data is proper. A file is just a string of bytes. If you
use the proper IO-layer while reading in the file, then you'll end up
with proper data (a string of characters, not of bytes) to work with.
A UTF-8 encoded file can't tell you that it is UTF-8 encoded. For
example a UTF-8 BOM at the start (as Windows Notepad uses) is not
proof.
So you need to know beforehand.
--
Affijn, Ruud
"Gewoon is een tijger."
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/