Hey,

running livedocs + hebrew+ php5, i found out that  for some 8bit, not
'ISO-8859-1' encoding, libxml it just mark all the 80 - FF range with
question marks.

the windows-1255 charset is one that affected by this behavior so we have to
mark all the hebrew xml files as ISO-8859-1, but the old build system
doesn't like it.
we have here more incompatibility of the old and new systems.

another translations projects may want to use this test script to look for
potential conflicts (php5+libxml only):

<?
//$charset = "ISO-8859-1";$a=128;$aend=255;
//$charset = "WINDOWS-1255";$a=224;$aend=250;
$charset = "iso-8859-8";$a=224;$aend=250;

$ord = ''; for(;$a<=$aend;$a++) $ord.=chr($a);

echo "$ord\n";
$xml = <<< XOF
<?xml version="1.0" encoding="$charset"?>
<chapter>$ord</chapter>
XOF;

//$xml = utf8_encode($xml);
$p = xml_parser_create();
xml_parser_set_option($p, XML_OPTION_CASE_FOLDING, 0);
xml_set_element_handler($p, 'start_elem', 'end_elem');
xml_set_character_data_handler($p, 'cdata');

if (!xml_parse($p, $xml, true)) {
 printf("XML: %d:%d %s\n",
   xml_get_current_line_number($p),
   xml_get_current_column_number($p),
   xml_error_string(xml_get_error_code($p))
   );
 $lines = explode("\n", $xml);
 $l = xml_get_current_line_number($p);
 echo "\nLine: $l is <b>" . htmlentities($lines[$l-1]) . "</b><br />\n";
 echo '<pre>';
 echo htmlentities($xml);
 echo '</pre>';
}
xml_parser_free($p);
function start_elem($parser, $tagname, $attributes){}
function end_elem($parser, $tagname){}
function cdata($parser, $data) { echo $data; }
?>

Reply via email to