ID:               29711
 Updated by:       [EMAIL PROTECTED]
 Reported By:      [EMAIL PROTECTED]
-Status:           Open
+Status:           Feedback
 Bug Type:         XML related
 Operating System: ALL
 PHP Version:      5.0.1
 New Comment:

It's not a bug per se, it's more a BC break and/or documentation
problem..

As libxml2 in PHP 5, detects the encoding automatically (which is
anyway the correct behaviour), you don't have to specify it. 

Therefore, in PHP 5, the 1st parameter to xml_parser_create() only
specifies the output encoding, which defaults to ISO-8859-1. If you
specify "UTF-8" there, you at least get UTF-8 encoded strings and can
convert them to Windows-1255.

So, what to do now? If we change that behaviour (Output encoding
defaults to iso-8859-1), we break BC to 5.0.0 and 5.0.1, if we leave
it, it's a BC break to 4.x. But IMHO anyway the behaviour of PHP 4 was
wrong (not respecting the source encoding specified in the XML
document), on the other hand, defaulting to ISO-8859-1 was also not a
very bright idea back then...

I'm in favor of leaving as it is and clearly document it.



Previous Comments:
------------------------------------------------------------------------

[2004-08-17 08:15:00] [EMAIL PROTECTED]

the external link give me the opportunity play with the html charset
and make sure that all the readers see exactly what i see.

anyway here the details: for the above script, the expact library used
on php4, apply to "WINDOWS-1255" encoding as "ISO-8859-1" and do
nothing with the chars.
but libxml on the another hand, detect the "windows-1255" as known
encoding, translate it to hebrew "UTF-8" using iconv for inner use and
finally php corrupt it on
http://cvs.php.net/co.php/php-src/ext/xml/xml.c?r=1.151#492 trying
simply to convert it to "ISO-8859-1".

To my opinion this behavior is a bug, if we knowing the source
encoding, why not convert the UTF-8 back to the source encoding by
default, using the internal iconv that was used for the reverse
conversion?

------------------------------------------------------------------------

[2004-08-17 08:13:30] [EMAIL PROTECTED]

the external link give me the opportunity play with the html charset
and make sure that all the readers see exactly what i see.

anyway here the details: for the above script, the expact library used
on php4, apply to "WINDOWS-1255" encoding as "ISO-8859-1" and do
nothing with the chars.
but libxml on the another hand, detect the "windows-1255" as known
encoding, translate it to hebrew "UTF-8" using iconv for inner use and
finally php corrupt it on
http://cvs.php.net/co.php/php-src/ext/xml/xml.c?r=1.151#492 trying
simply to convert it to "ISO-8859-1".

To my opinion this behavior is a bug, if we knowing the source
encoding, why not convert the UTF-8 back to the source encoding by
default, using the internal iconv that was used for the reverse
conversion?

------------------------------------------------------------------------

[2004-08-17 08:06:07] [EMAIL PROTECTED]

Please fill in the details on this bug system and noy exculsive link to
an external site.

------------------------------------------------------------------------

[2004-08-16 20:32:49] [EMAIL PROTECTED]

Description:
------------
here fuul details:
http://www.phpil.net/php5xml.php

Reproduce code:
---------------
<?
error_reporting(E_ALL);

$xml = '<?xml version="1.0" encoding="WINDOWS-1255"?><x>рсту</x>';

$p = xml_parser_create();
xml_parser_set_option($p, XML_OPTION_CASE_FOLDING, 0);
xml_set_element_handler($p, 'start_elem', 'end_elem');
xml_set_character_data_handler($p, 'cdata');
xml_parse($p,$xml, true);
xml_parser_free($p);


function start_elem($parser, $tagname, $attributes){}

function end_elem($parser, $tagname){}

function cdata($parser,$data) {
    echo $data;
}
?> 

Expected result:
----------------
рсту

Actual result:
--------------
????


------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=29711&edit=1

Reply via email to