From:             gehrig at ishd dot de
Operating system: Windows XP
PHP version:      5.2.6
PHP Bug Type:     Strings related
Bug description:  strcoll() does not work with UTF-8 strings on Windows

Description:
------------
The strcoll() function for sorting comparing strings in a locale-aware
manner does not seem to work with UTF-8 encoded strings despite using the
correct Windows locale with UTF-8 codepage (65001). strcoll() always
returns 2147483647 which makes array sorting of such strings more or less
random (for example).
Running the same snippet with Windows-1252 (ISO-8859-1) encoded strings or
on a Linux machine does in fact work as expected.

Please note: for running the following reproduce code, the PHP file must
be UTF-8 encoded!

Reproduce code:
---------------
<?php
function traceStrColl($a, $b) {
    $outValue=strcoll($a, $b);
    echo "$a $b $outValue\r\n";
    return $outValue;
}

$locale=(defined('PHP_OS') && stristr(PHP_OS, 'win')) ?
'German_Germany.65001' : 'de_DE.utf8';

$string="ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÜabcdefghijklmnopqrstuvwxyzäöüß";
$array=array();
for ($i=0; $i<mb_strlen($string, 'UTF-8'); $i++) {
    $array[]=mb_substr($string, $i, 1, 'UTF-8');
}
$oldLocale=setlocale(LC_COLLATE, "0");
var_dump(setlocale(LC_COLLATE, $locale));
usort($array, 'traceStrColl');
setlocale(LC_COLLATE, $oldLocale);
var_dump($array);

Expected result:
----------------
string(20) "German_Germany.65001"
a B -1
[...]
array(59) {
  [0]=>
  string(1) "a"
  [1]=>
  string(1) "A"
  [2]=>
  string(2) "ä"
  [3]=>
  string(2) "Ä"
  [4]=>
  string(1) "b"
  [5]=>
  string(1) "B"
  [6]=>
  string(1) "c"
  [7]=>
  string(1) "C"
  [8]=>
  string(1) "d"
  [9]=>
  string(1) "D"
  [10]=>
  string(1) "e"
  [11]=>
  string(1) "E"
  [12]=>
  string(1) "f"
  [13]=>
  string(1) "F"
  [14]=>
  string(1) "g"
  [15]=>
  string(1) "G"
  [16]=>
  string(1) "h"
  [17]=>
  string(1) "H"
  [18]=>
  string(1) "i"
  [19]=>
  string(1) "I"
  [20]=>
  string(1) "j"
  [21]=>
  string(1) "J"
  [22]=>
  string(1) "k"
  [23]=>
  string(1) "K"
  [24]=>
  string(1) "l"
  [25]=>
  string(1) "L"
  [26]=>
  string(1) "m"
  [27]=>
  string(1) "M"
  [28]=>
  string(1) "n"
  [29]=>
  string(1) "N"
  [30]=>
  string(1) "o"
  [31]=>
  string(1) "O"
  [32]=>
  string(2) "ö"
  [33]=>
  string(2) "Ö"
  [34]=>
  string(1) "p"
  [35]=>
  string(1) "P"
  [36]=>
  string(1) "q"
  [37]=>
  string(1) "Q"
  [38]=>
  string(1) "r"
  [39]=>
  string(1) "R"
  [40]=>
  string(1) "s"
  [41]=>
  string(1) "S"
  [42]=>
  string(2) "ß"
  [43]=>
  string(1) "t"
  [44]=>
  string(1) "T"
  [45]=>
  string(1) "u"
  [46]=>
  string(1) "U"
  [47]=>
  string(2) "ü"
  [48]=>
  string(2) "Ü"
  [49]=>
  string(1) "v"
  [50]=>
  string(1) "V"
  [51]=>
  string(1) "w"
  [52]=>
  string(1) "W"
  [53]=>
  string(1) "x"
  [54]=>
  string(1) "X"
  [55]=>
  string(1) "y"
  [56]=>
  string(1) "Y"
  [57]=>
  string(1) "z"
  [58]=>
  string(1) "Z"
}

Actual result:
--------------
string(20) "German_Germany.65001"
a B 2147483647
[...]
array(59) {
  [0]=>
  string(1) "c"
  [1]=>
  string(1) "B"
  [2]=>
  string(1) "s"
  [3]=>
  string(1) "C"
  [4]=>
  string(1) "k"
  [5]=>
  string(1) "D"
  [6]=>
  string(2) "ä"
  [7]=>
  string(1) "E"
  [8]=>
  string(1) "g"
  [9]=>
  string(1) "F"
  [10]=>
  string(1) "o"
  [11]=>
  string(1) "G"
  [12]=>
  string(1) "w"
  [13]=>
  string(1) "H"
  [14]=>
  string(1) "A"
  [15]=>
  string(1) "I"
  [16]=>
  string(1) "e"
  [17]=>
  string(1) "J"
  [18]=>
  string(1) "i"
  [19]=>
  string(1) "K"
  [20]=>
  string(1) "m"
  [21]=>
  string(1) "L"
  [22]=>
  string(1) "q"
  [23]=>
  string(1) "M"
  [24]=>
  string(1) "u"
  [25]=>
  string(1) "N"
  [26]=>
  string(1) "y"
  [27]=>
  string(1) "O"
  [28]=>
  string(2) "ü"
  [29]=>
  string(1) "P"
  [30]=>
  string(1) "b"
  [31]=>
  string(1) "Q"
  [32]=>
  string(1) "d"
  [33]=>
  string(1) "R"
  [34]=>
  string(1) "f"
  [35]=>
  string(1) "S"
  [36]=>
  string(1) "h"
  [37]=>
  string(1) "T"
  [38]=>
  string(1) "j"
  [39]=>
  string(1) "U"
  [40]=>
  string(1) "l"
  [41]=>
  string(1) "V"
  [42]=>
  string(1) "n"
  [43]=>
  string(1) "W"
  [44]=>
  string(1) "p"
  [45]=>
  string(1) "X"
  [46]=>
  string(1) "r"
  [47]=>
  string(1) "Y"
  [48]=>
  string(1) "t"
  [49]=>
  string(1) "Z"
  [50]=>
  string(1) "v"
  [51]=>
  string(2) "Ä"
  [52]=>
  string(1) "x"
  [53]=>
  string(2) "Ö"
  [54]=>
  string(1) "z"
  [55]=>
  string(2) "Ü"
  [56]=>
  string(2) "ö"
  [57]=>
  string(1) "a"
  [58]=>
  string(2) "ß"
}

-- 
Edit bug report at http://bugs.php.net/?id=46165&edit=1
-- 
Try a CVS snapshot (PHP 5.2): 
http://bugs.php.net/fix.php?id=46165&r=trysnapshot52
Try a CVS snapshot (PHP 5.3): 
http://bugs.php.net/fix.php?id=46165&r=trysnapshot53
Try a CVS snapshot (PHP 6.0): 
http://bugs.php.net/fix.php?id=46165&r=trysnapshot60
Fixed in CVS:                 http://bugs.php.net/fix.php?id=46165&r=fixedcvs
Fixed in release:             
http://bugs.php.net/fix.php?id=46165&r=alreadyfixed
Need backtrace:               http://bugs.php.net/fix.php?id=46165&r=needtrace
Need Reproduce Script:        http://bugs.php.net/fix.php?id=46165&r=needscript
Try newer version:            http://bugs.php.net/fix.php?id=46165&r=oldversion
Not developer issue:          http://bugs.php.net/fix.php?id=46165&r=support
Expected behavior:            http://bugs.php.net/fix.php?id=46165&r=notwrong
Not enough info:              
http://bugs.php.net/fix.php?id=46165&r=notenoughinfo
Submitted twice:              
http://bugs.php.net/fix.php?id=46165&r=submittedtwice
register_globals:             http://bugs.php.net/fix.php?id=46165&r=globals
PHP 4 support discontinued:   http://bugs.php.net/fix.php?id=46165&r=php4
Daylight Savings:             http://bugs.php.net/fix.php?id=46165&r=dst
IIS Stability:                http://bugs.php.net/fix.php?id=46165&r=isapi
Install GNU Sed:              http://bugs.php.net/fix.php?id=46165&r=gnused
Floating point limitations:   http://bugs.php.net/fix.php?id=46165&r=float
No Zend Extensions:           http://bugs.php.net/fix.php?id=46165&r=nozend
MySQL Configuration Error:    http://bugs.php.net/fix.php?id=46165&r=mysqlcfg

Reply via email to