Re: [pcre-dev] First slot of the offset vector have a wrong value when PCRE_ERROR_SHORTUTF8 rises

Philip Hazel Sun, 06 Feb 2011 09:47:46 -0800

On Sat, 5 Feb 2011, ND wrote:

> > I wrote:
> > I think the problem may be resolved if PCRE will put a position of  
> > incomplete UTF-8 character as start offset.
> 
> 
> I must rectify the omission: not "as start offset" but "INSTEAD start  
> offset".


I'm afraid I don't understand why this is needed. The position of the
incomplete UTF-8 character is easy to find. Start at the end of the byte
string and look backwards until you find a byte that has both the 0xc0
bits set. That is the first byte of the incomplete UTF-8 character.

Putting something other than a match start into the offsets vector 
rather breaks the philosophy of PCRE.

Philip

-- 
Philip Hazel

-- 
## List details at http://lists.exim.org/mailman/listinfo/pcre-dev

Re: [pcre-dev] First slot of the offset vector have a wrong value when PCRE_ERROR_SHORTUTF8 rises

Reply via email to