ID:               29711
 Updated by:       [EMAIL PROTECTED]
 Reported By:      [EMAIL PROTECTED]
 Status:           Feedback
 Bug Type:         XML related
 Operating System: ALL
 PHP Version:      5.0.1
 New Comment:

We are still early enough in PHP5 deployment to fix stuff like this and
not worry too much about 5.0.x BC breakage.  Especially for something
which already has broken BC with PHP4.  People porting their stuff are
already screwed.  So we either revert to PHP4 behaviour to remove the
BC break or we do it right and default to UTF-8.  Leaving it as-is
because of BC worries to 5.0.x shouldn't even be considered.  
My vote is to default it to UTF-8.


Previous Comments:
------------------------------------------------------------------------

[2004-08-17 09:15:32] [EMAIL PROTECTED]

the minimun BC 'll be convert it back to the orginal encoding by
default, it'll not break any php4 or php5 scripts, it'll break only the
not working one, what the problem with it?

------------------------------------------------------------------------

[2004-08-17 08:58:46] [EMAIL PROTECTED]

Oh come on, nobody uses PHP 5 yet :) For once do things right instead
of half-assed solutions we've had for the past years because of BC
issues.

------------------------------------------------------------------------

[2004-08-17 08:51:03] [EMAIL PROTECTED]

It works for all correctly, which use iso-8859-1 (or similar or utf-8
in the iso-8859-1 space) source encoding or which did specify utf-8 as
output encoding. So for the majority (I'd say), it works as expected,
if we change default to UTF-8 (which would be of course the correct
solution), it will break a lot of people's code.

The encoding thingie in ext/xml in PHP 4 was always broken, IMHO. We
unfortunately missed the chance to implement it more correctly with
slight BC breaks for PHP 5.0.0...

------------------------------------------------------------------------

[2004-08-17 08:28:58] [EMAIL PROTECTED]

I'm for breaking it and make it output UTF-8 by default like the domxml
stuff does. Consistency is a good thing and as it doesn't work
"correctly" for 5.0.0 and 5.0.1 anyway I'd say we fix this.

------------------------------------------------------------------------

[2004-08-17 08:19:25] [EMAIL PROTECTED]

It's not a bug per se, it's more a BC break and/or documentation
problem..

As libxml2 in PHP 5, detects the encoding automatically (which is
anyway the correct behaviour), you don't have to specify it. 

Therefore, in PHP 5, the 1st parameter to xml_parser_create() only
specifies the output encoding, which defaults to ISO-8859-1. If you
specify "UTF-8" there, you at least get UTF-8 encoded strings and can
convert them to Windows-1255.

So, what to do now? If we change that behaviour (Output encoding
defaults to iso-8859-1), we break BC to 5.0.0 and 5.0.1, if we leave
it, it's a BC break to 4.x. But IMHO anyway the behaviour of PHP 4 was
wrong (not respecting the source encoding specified in the XML
document), on the other hand, defaulting to ISO-8859-1 was also not a
very bright idea back then...

I'm in favor of leaving as it is and clearly document it.


------------------------------------------------------------------------

The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
    http://bugs.php.net/29711

-- 
Edit this bug report at http://bugs.php.net/?id=29711&edit=1

Reply via email to