Hi!
I'm currently developing a nice script that generates OpenOffice SXW files
by filling the content.xml (which is UTF-8 encoded) with database content.
While trying to do this I found out that utf8_encode('"') (charcode 147)
returns '�"'. But when I checked the whole result in OffenOffice '"' is
displayed as square (character unknown?!). So I made some tests with UTF-8
conversion (even mb_* functions) and recognized that characters between 128
and 160 returned by utf8_encode() don't seem to match the standard. As
mentioned above '"' is returned as '�"' but should be '�?T' (as you will get
it using UltraEdit for conversion).
Does anyone can give me some explanations here?
I'm not familiar with this UTF-8 / bit-conversion stuff, but I don't think
PHP does what it's supposed to do here. For a first workaround I simply
coded a custom_utf8_encode() that uses an own char map to override this
misbehaviour (see below). Can someone help my out with this strange bug?!
Regards
Bjoern Kraus
function custom_utf8_encode($str)
{
$chrMap = array(128 => '�,�', 129 => '', 130 => '�?s', 131 => '�'',
132 => '�?z', 133 => '�?�', 134 => '�? ', 135 => '�?�',
136 => '�?', 137 => '�?�', 138 => '� ', 139 => '�?�',
140 => '�'', 141 => '', 142 => 'Ž', 143 => '',
144 => '', 145 => '�?~', 146 => '�?T', 147 => '�?o',
148 => '�?�', 149 => '�?�', 150 => '�?"', 151 => '�?"',
152 => '�o', 153 => '�"�', 154 => 'š', 155 => '�?�',
156 => '�"', 157 => '', 158 => 'ž', 159 => 'Ÿ');
$newStr = '';
for ($i = 0; $i < strlen($str); $i++) {
$chrVal = ord($str[$i]);
if ($chrVal > 127 && $chrVal < 160) {
$newStr .= $chrMap[$chrVal];
}
else {
$newStr .= utf8_encode($str[$i]);
}
}
return $newStr;
}
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php