On Wed, 9 Feb 2011, ND wrote:

> >Putting something other than a match start into the offsets vectorrather  
> > breaks the philosophy of PCRE.
> >
> It's useful to give to main application information about position when  
> PCRE_ERROR_BADUTF8 or PCRE_ERROR_SHORTUTF8 occurs. It must not be returned  
> in offsets vector nesessarily. May be in another memory block.
> This information can help main application to analyze and fix erroneous  
> stream.

OK, I've changed my mind and decided that the offsets vector can be 
used. I also decided that if this was happening, I should do the job 
*properly*. I have just committed a patch which behaves like this:

If the size of the ovector is at least 2, then, for PCRE_ERROR_BADUTF8 
or PCRE_ERROR_SHORTUTF8,

  ovector[0] is set to the byte offset of the first byte of the invalid
             character
  ovector[1] is set to a reason code
  
There are 21 different reason codes, documented in the pcreapi man page. 
They include codes for "short by n bytes" (where n is 1-5), so in fact 
PCRE_ERROR_SHORTUTF8 is no longer needed. However, I have not removed 
it because that would break backwards compatibility.

Philip

-- 
Philip Hazel

-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 

Reply via email to