Hey, running livedocs + hebrew+ php5, i found out that for some 8bit, not 'ISO-8859-1' encoding, libxml it just mark all the 80 - FF range with question marks.
the windows-1255 charset is one that affected by this behavior so we have to mark all the hebrew xml files as ISO-8859-1, but the old build system doesn't like it. we have here more incompatibility of the old and new systems. another translations projects may want to use this test script to look for potential conflicts (php5+libxml only): <? //$charset = "ISO-8859-1";$a=128;$aend=255; //$charset = "WINDOWS-1255";$a=224;$aend=250; $charset = "iso-8859-8";$a=224;$aend=250; $ord = ''; for(;$a<=$aend;$a++) $ord.=chr($a); echo "$ord\n"; $xml = <<< XOF <?xml version="1.0" encoding="$charset"?> <chapter>$ord</chapter> XOF; //$xml = utf8_encode($xml); $p = xml_parser_create(); xml_parser_set_option($p, XML_OPTION_CASE_FOLDING, 0); xml_set_element_handler($p, 'start_elem', 'end_elem'); xml_set_character_data_handler($p, 'cdata'); if (!xml_parse($p, $xml, true)) { printf("XML: %d:%d %s\n", xml_get_current_line_number($p), xml_get_current_column_number($p), xml_error_string(xml_get_error_code($p)) ); $lines = explode("\n", $xml); $l = xml_get_current_line_number($p); echo "\nLine: $l is <b>" . htmlentities($lines[$l-1]) . "</b><br />\n"; echo '<pre>'; echo htmlentities($xml); echo '</pre>'; } xml_parser_free($p); function start_elem($parser, $tagname, $attributes){} function end_elem($parser, $tagname){} function cdata($parser, $data) { echo $data; } ?>