On Sat, 5 Feb 2011, ND wrote:

> > I wrote:
> > I think the problem may be resolved if PCRE will put a position of  
> > incomplete UTF-8 character as start offset.
> 
> 
> I must rectify the omission: not "as start offset" but "INSTEAD start  
> offset".

I'm afraid I don't understand why this is needed. The position of the
incomplete UTF-8 character is easy to find. Start at the end of the byte
string and look backwards until you find a byte that has both the 0xc0
bits set. That is the first byte of the incomplete UTF-8 character.

Putting something other than a match start into the offsets vector 
rather breaks the philosophy of PCRE.

Philip

-- 
Philip Hazel

-- 
## List details at http://lists.exim.org/mailman/listinfo/pcre-dev 

Reply via email to