Rather than going through the somewhat buggy process of trying to determine which of the many character sets there are, is there some way that I can just universally convert everything
into UTF8?

I can open a file with a :utf8 declaration when creating the file handle. But do I need to do this on a utf8 file or will perl just "know". If it doesn't, can I just open everything in utf8 mode and not lose any data?


On May 12, 2007, at 5:04 AM, Dr.Ruud wrote:

Tom Allison schreef:

Under perl version 5.8, does /(\w+)/ match UTF-8 characters without
calling any special pragma?

Yes, but only if your data is proper. Mind that any ASCII-character is a
UTF-8 character too (U+0000 .. U+007F).


So I'm trying to see if I can just use /(\w+)/ without worrying about
all this character encoding?

Only if your data is proper. A file is just a string of bytes. If you
use the proper IO-layer while reading in the file, then you'll end up
with proper data (a string of characters, not of bytes) to work with.

A UTF-8 encoded file can't tell you that it is UTF-8 encoded. For
example a UTF-8 BOM at the start (as Windows Notepad uses) is not proof.
So you need to know beforehand.

--
Affijn, Ruud

"Gewoon is een tijger."


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to