https://bugs.exim.org/show_bug.cgi?id=2483

Petr Pisar <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |INVALID

--- Comment #4 from Petr Pisar <[email protected]> ---
The reproducer can be reduced to:

/[^a]*\x{3c2}/i,utf
\x{d10000}\=no_utf_check

It crashes because the subject text \x{d10000} is not an valid UTF-8 text and
at the same time you disable checks for UTF-8 validity with no_utf_check
subject modifier. If you remove the modifier:

/[^a]*\x{3c2}/i,utf
\x{d10000}

then PCRE performs the check and explains what's wrong with the subject text:

$ pcre2test < test 
PCRE2 version 10.33 2019-04-16
/[^a]*\x{3c2}/i,utf
\x{d10000}
Failed: error -13: UTF-8 error: 5-byte character is not allowed (RFC 3629) at
offset 0

This is not a bug. It's a documented behavior. From pcre2api(3) manual:

       If  you  know that your pattern is a valid UTF string, and you want to
skip this
       check for performance reasons, you can set the PCRE2_NO_UTF_CHECK 
option.  When
       it  is  set,  the  effect of passing an invalid UTF string as a pattern
is unde‐
       fined. It may cause your program to crash or loop.

       Note that this option can also be passed to pcre2_match() and 
pcre_dfa_match(),
       to suppress UTF validity checking of the subject string.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 

Reply via email to