matching multibyte utf-8 in perl

John Kilbourne Sat, 01 Mar 2003 04:43:58 -0800

Jarrko:
I saw your post in the perl unicode developer list:

From: Jarkko Hietaniemi [mailto:[EMAIL PROTECTED] 
Sent: Friday, January 10, 2003 1:39 PM
To: Merijn van den Kroonenberg
Cc: Narins, Josh; [EMAIL PROTECTED]
Subject: Re: beginniner's 5.6.1 latin1<->utf8 question

On Fri, Jan 10, 2003 at 07:28:00PM +0100, Merijn van den Kroonenberg
wrote:
> You might be looking for these:
> > > # ISO 8859-1 to UTF-8
> s/([\x80-\xFF])/chr(0xC0|ord($1)>>6).chr(0x80|ord($1)&0x3F)/eg;
> > # UTF-8 to ISO 8859-1
> s/([\xC2\xC3])([\x80-\xBF])/chr(ord($1)<<6&0xC0|ord($2)&0x3F)/eg;
> > I think that will work (they are not mine, so don't blame me if not 
> ;-)

They are mine :-) so I feel free to say that they don't &#NNN;
conversion...
but they certainly could be changed to work so.

I am a beginner as well, with the task of finding and counting the
non-ascii characters in a utf-8 text. How do I do this?

matching multibyte utf-8 in perl

Reply via email to