ID: 47480 User updated by: sehh at ionos dot gr Reported By: sehh at ionos dot gr Status: Open Bug Type: PCRE related Operating System: Linux PHP Version: 5.2.8 New Comment:
I forgot the capital accented characters, so the above should read: "Ç" == "Þ" == "ç" == "¹" "Á" == "Ü" == "á" == "¶" etc.. Remember that in Greek, the accent may be omitted from capital letters or may be included for the first letter only. So that should produce proper case-insensitive results. Previous Comments: ------------------------------------------------------------------------ [2009-03-09 14:54:32] sehh at ionos dot gr The PCRE library is wrong then. "Ç" is correctly defined in Unicode as "ç", but the library should also understand the meaning of "Ç" == "Þ" == "ç". This counts for all Greek accents: "Á" == "Ü" == "á" etc... Otherwise, the parameter "/i" is useless for the Greek language and thats why the current implementation does not work for Greek. Thank you for taking the time to look into this issue, much appreciated. ------------------------------------------------------------------------ [2009-03-09 14:31:03] mmcnickle at gmail dot com You're absolutely correct, I do not speak Greek. But neither does the PCRE library. It determines the uppercase/lowercase relationship between characters solely using Unicode properties. The lowercase of Ç is defined in Unicode as ç [1], not Þ. Therefore the case-insensitive search will not match. [1]http://www.fileformat.info/info/unicode/char/00c7/index.htm ------------------------------------------------------------------------ [2009-03-09 12:16:43] sehh at ionos dot gr Obviously you have no idea what you are talking about and obviously you don't speak Greek or know anything about the Greek language. The word "êéíçôÞñá" is capitalized as "ÊÉÍÇÔÇÑÁ". What you are suggesting is like capitalizing the word "engine" as "ENGiNE". Obviously, there is no word "ENGiNE", same way there is no word "ÊÉÍÇÔÞÑÁ" :) ------------------------------------------------------------------------ [2009-03-09 11:59:53] mmcnickle at gmail dot com The test case is wrong and the bug should be closed. The upper case search target is misspelled. $target1 = "ÊÉÍÇÔÇÑÁ"; $target2 = "êéíçôÞñá"; should read $target1 = "ÊÉÍÇÔÞÑÁ"; $target2 = "êéíçôÞñá"; (note the replacement of the second Ç with a capital Thorn (U+00DE). With this change I get the expected result: Actual Result ------------- Searching for: ÊÉÍÇÔÞÑÁ Result string: Ôï êõñßùò ôìÞìá ôïõ itworks, áõôü ðïõ ðåñéëáìâÜíåé ôïõò êõëßíäñïõò Found and replaced: 1 Searching for: êéíçôþñá Result string: Ôï êõñßùò ôìÞìá ôïõ itworks, áõôü ðïõ ðåñéëáìâÜíåé ôïõò êõëßíäñïõò Found and replaced: 1 ------------------------------------------------------------------------ [2009-02-23 13:32:39] sehh at ionos dot gr Description: ------------ preg_replace with the "/i" (case insensitive search) does not do a case insensitive search for UTF-8 Greek characters, while it works fine for English characters. Reproduce code: --------------- <?php $string = "Ôï êõñßùò ôìÞìá ôïõ êéíçôÞñá, áõôü ðïõ ðåñéëáìâÜíåé ôïõò êõëßíäñïõò"; // UTF-8 string in Greek language $target1 = "ÊÉÍÇÔÇÑÁ"; // Target string to search for (capitalized) $target2 = "êéíçôÞñá"; // Target string to search for (small letters) $replace = "itworks"; // Replace with this string $rc = preg_replace("/$target1/imsUu", $replace, $string, -1, $counter); // Execute search for target1 and replace echo "\nSearching for: ".$target1."\n"; // Report output echo "Result string: ".$rc."\n"; echo "Found and replaced: ".$counter."\n"; $rc = preg_replace("/$target2/imsUu", $replace, $string, -1, $counter); // Execute search for target2 and replace echo "\nSearching for: ".$target2."\n"; // Report output echo "Result string: ".$rc."\n"; echo "Found and replaced: ".$counter."\n\n"; ?> Expected result: ---------------- I expect the Found and Replaced to be both "1" since the expression is not case sensitive. Actual result: -------------- $ php -f test.php Searching for: ÊÉÍÇÔÇÑÁ Result string: Ôï êõñßùò ôìÞìá ôïõ êéíçôÞñá, áõôü ðïõ ðåñéëáìâÜíåé ôïõò êõëßíäñïõò Found and replaced: 0 Searching for: êéíçôÞñá Result string: Ôï êõñßùò ôìÞìá ôïõ itworks, áõôü ðïõ ðåñéëáìâÜíåé ôïõò êõëßíäñïõò Found and replaced: 1 ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=47480&edit=1