Edit report at https://bugs.php.net/bug.php?id=60423&edit=1
ID: 60423 User updated by: amal dot samally at gmail dot com Reported by: amal dot samally at gmail dot com Summary: Segmentation fault with the UTF-8 check regexp in some cases -Status: Feedback +Status: Open Type: Bug Package: PCRE related Operating System: Linux PHP Version: 5.3.8 Block user comment: N Private report: N New Comment: I think not. Also changing pcre.backtrack_limit / pcre.recursion_limit do not give anything. Previous Comments: ------------------------------------------------------------------------ [2011-12-01 10:10:52] larue...@php.net see #41638, may be the same. ------------------------------------------------------------------------ [2011-12-01 09:04:37] amal dot samally at gmail dot com Description: ------------ I'm using the regexp to test whether a string is a valid UTF-8 encoded string. But in some cases it causes a segmentation fault. Examples of strings that cause the error: http://samally.ru/php_pcre_segmentation_fault/test1.txt http://samally.ru/php_pcre_segmentation_fault/test2.txt Test script: --------------- $string = file_get_contents('http://samally.ru/php_pcre_segmentation_fault/test1.txt'); // $string = file_get_contents('http://samally.ru/php_pcre_segmentation_fault/test2.txt'); // Tests whether a string is a valid UTF-8 encoded string. // @link http://w3.org/International/questions/qa-forms-utf-8.html $r = preg_match('~^(?: [\x09\x0A\x0D\x20-\x7E] # ASCII without control characters | [\xC2-\xDF][\x80-\xBF] # non-overlong 2-byte | \xE0[\xA0-\xBF][\x80-\xBF] # excluding overlongs | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} # straight 3-byte | \xED[\x80-\x9F][\x80-\xBF] # excluding surrogates | \xF0[\x90-\xBF][\x80-\xBF]{2} # planes 1-3 | [\xF1-\xF3][\x80-\xBF]{3} # planes 4-15 | \xF4[\x80-\x8F][\x80-\xBF]{2} # plane 16 )*$~DSXx', $string); ------------------------------------------------------------------------ -- Edit this bug report at https://bugs.php.net/bug.php?id=60423&edit=1