On Sat, 5 Feb 2011, ND wrote: > > I wrote: > > I think the problem may be resolved if PCRE will put a position of > > incomplete UTF-8 character as start offset. > > > I must rectify the omission: not "as start offset" but "INSTEAD start > offset".
I'm afraid I don't understand why this is needed. The position of the incomplete UTF-8 character is easy to find. Start at the end of the byte string and look backwards until you find a byte that has both the 0xc0 bits set. That is the first byte of the incomplete UTF-8 character. Putting something other than a match start into the offsets vector rather breaks the philosophy of PCRE. Philip -- Philip Hazel -- ## List details at http://lists.exim.org/mailman/listinfo/pcre-dev
