Re: [pcre-dev] [Bug 791] New: UTF-8 support does not work on EBCDIC platforms

Zack Weinberg Tue, 16 Dec 2008 15:55:58 -0800

On Tue, Dec 16, 2008 at 1:18 PM, Philip Hazel <[email protected]> wrote:
> On Tue, 16 Dec 2008, Martin Jerabek wrote:
>
>> The real problem is now that the PCRE code compares the characters
>> taken from the pattern with normal C character literals such as '\\',
>> '*', etc. This works fine on non-EBCDIC (ASCII) platforms because the
>> 7-bit ASCII subset of UTF-8 is identical to ASCII.
>
> Exactly - I thought that was the whole point of UTF-8.


There is an encoding called UTF-EBCDIC that is probably what you
(Martin) really want to be using here (see
http://www.unicode.org/unicode/reports/tr16/)  It is a modification of
UTF-8 with the property that the characters encoded as a single byte
map to a particular EBCDIC code page the same way that UTF-8
single-byte characters map to ASCII.

zw

-- 
## List details at http://lists.exim.org/mailman/listinfo/pcre-dev

Re: [pcre-dev] [Bug 791] New: UTF-8 support does not work on EBCDIC platforms

Reply via email to