Invalid Uicode characters

z . xiao Tue, 16 Sep 2003 05:38:25 -0700

Dear PERLists,

I am running Perl 5.8. and trying to filter out some invalid Unicode characters from 
Unicoded texts of some South Asian languages. There are 28 such characters in my data 
(all control characters):


0x1, 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18, 0x19, 0x1B, 0x1C, 0x1D, 
0x1F, 0x1e, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8, 0xB, 0xC, 0xF, 0xFFFF, 0xe 

The data is coded as utf-16 and I want to keep it this way when the invalid characters 
are removed. Is there an easy way to do this with Perl while keeping the textual 
quality intact? Any advice is welcome. Thanks.

Best,

Richard

Invalid Uicode characters

Reply via email to