On Tue, Dec 16, 2008 at 1:18 PM, Philip Hazel <[email protected]> wrote: > On Tue, 16 Dec 2008, Martin Jerabek wrote: > >> The real problem is now that the PCRE code compares the characters >> taken from the pattern with normal C character literals such as '\\', >> '*', etc. This works fine on non-EBCDIC (ASCII) platforms because the >> 7-bit ASCII subset of UTF-8 is identical to ASCII. > > Exactly - I thought that was the whole point of UTF-8.
There is an encoding called UTF-EBCDIC that is probably what you (Martin) really want to be using here (see http://www.unicode.org/unicode/reports/tr16/) It is a modification of UTF-8 with the property that the characters encoded as a single byte map to a particular EBCDIC code page the same way that UTF-8 single-byte characters map to ASCII. zw -- ## List details at http://lists.exim.org/mailman/listinfo/pcre-dev
