ID:               40433
 Updated by:       [EMAIL PROTECTED]
 Reported By:      consensus at gmail dot com
-Status:           Open
+Status:           Bogus
 Bug Type:         SimpleXML related
 Operating System: any
 PHP Version:      5.2.1
 New Comment:

Yes, it's defined behaviour and will stay that way.

About the "waste of cpu cycles". The  library used for 
SimpleXML (and all other XML extensions) is libxml2 and this 
library does convert anything internally to UTF-8, regardless 
what the input is.

So, you have to convert it back to whatever charset you want 
in your PHP script


Previous Comments:
------------------------------------------------------------------------

[2007-02-10 22:07:20] consensus at gmail dot com

Description:
------------
SimpleXML has a wrong behaviour (yes it is defined, but still wrong)
regarding the charset.

SimpleXML is able to read any xml file as long as the charset is given
in the xml header (encoding="...")
But when it gives back values it does not recognize the encoding
anymore and forces utf-8.

While this might be defined behaviour it is still a very
unclean/ignorant feature.
It is the only function i know which behaves like this.

Charsets do have their good reason.
and ofcourse you can convert each value you get into the correct
charset.
But here why this is generally a bad idea:
a) Everyone expects the function to not change the charset.
b) This is a big waste of cputime.
   If you have millions of values you have millions of 
   function calls to reconvert them all!
   Depending on the application you write this can get a 
   real problem.


I suggest to rethink about this design decision as it will be a problem
for many others too over the time.



------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=40433&edit=1

Reply via email to