On Sat, Jan 12, 2002 at 05:42:31PM +0100, Holger Rauch wrote: | Hi! | | Thanks for your reply! | | On Sun, 6 Jan 2002, dman wrote: | | > [...] | > So the regexps you're using are in a 8859-n source file, right? | | Yep. | | > Can perl handle UTF-8 source files? | | Don't know. That's why I mailed this question ;-) | | > Are you trying to use things like the | > posix character class [:alpha:]? | | No. | | > I don't think those will handle all | > alphabetic characters in all unicode supported languages (probably | > just ascii/english alphabet). | | What about \w?
No idea. Here's another thought, though. Are you reading the file as if it was single-byte? If so, then that won't work right. For example, the euro symbol is character \u20ac. In UTF-8 the file will contain '\xe2\x82\xac'. If you read this as you would any other single-byte file you'll have 3 characters above the us-ascii range. -D -- The light of the righteous shines brightly, but the lamp of the wicked is snuffed out. Proverbs 13:9