ID:               41554
 User updated by:  victorepand at gmail dot com
 Reported By:      victorepand at gmail dot com
-Status:           Feedback
+Status:           Open
 Bug Type:         Strings related
 Operating System: Linux
 PHP Version:      4.4.7
 New Comment:

Here are 2 short test scripts that demonstrate the problem:

<?php
$testhtml="<html>\n<head>\n<META http-equiv=Content-Type
content=\"text/html; charset=UTF-8\">\n</head>\n<body>\nSpecial
Characters: ©,’,—,“,”,®,™,…</body>\n</html>";
print $testhtml;
?>

The sample output is shown here:
http://www.vacuumfoodsealer.info/utftest2.php
Special Characters:
&#65533;,&#65533;,&#65533;,&#65533;,&#65533;,&#65533;,&#65533;,&#65533;

The result is garbled which is correct in this case, because the
content-type of the page is UTF-8 and the characters are not encoded.

However, the second test script:
<?php
$testhtml="<html>\n<head>\n<META http-equiv=Content-Type
content=\"text/html; charset=UTF-8\">\n</head>\n<body>\nSpecial
Characters: ©,’,—,“,”,®,™,…</body>\n</html>";
print utf8_encode($testhtml);
?>

Produces this output here:
http://www.vacuumfoodsealer.info/utftest.php
Special Characters: ©,&#146;,&#151;,&#147;,&#148;,®,&#153;,&#133;

This time the characters have been encoded into UTF-8. Since the
content-type of the page is UTF-8 and the characters have been encoded
into UTF-8, then why should they appear garbled? And if it is not a bug
with utf8_encode, then what method would I use to correctly display
these characters in UTF-8? I don't know of any function that will
convert these characters!


Previous Comments:
------------------------------------------------------------------------

[2007-06-04 17:27:33] [EMAIL PROTECTED]

Thank you for this bug report. To properly diagnose the problem, we
need a short but complete example script to be able to reproduce
this bug ourselves. 

A proper reproducing script starts with <?php and ends with ?>,
is max. 10-20 lines long and does not require any external 
resources such as databases, etc. If the script requires a 
database to demonstrate the issue, please make sure it creates 
all necessary tables, stored procedures etc.

Please avoid embedding huge scripts into the report.



------------------------------------------------------------------------

[2007-06-01 01:32:55] [EMAIL PROTECTED]

My gut reaction to your problem is to mention that you've probably
mixed up ISO 8859-1 and Windows-1252: the two are commonly confused for
each other, the Windows encoding containing several more characters:
€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ

However, said behavior does not precisely match up with your
predicament, as © and ® are part of ISO 8859-1. Furthermore, the URL you
supplied is already encoded in UTF-8. Perhaps you are double encoding?

Either way, this is not a problem with the documentation, except
possibly the fact that the user notes are waaaaay to long on utf8_encode
and some of the info needs to be integrated into the main docs.

------------------------------------------------------------------------

[2007-06-01 00:57:31] victorepand at gmail dot com

Description:
------------
I have used the function utf8_encode to encode iso-8859-1 pages into
UTF-8 and displayed them on my site, but strange and funny characters
are appearing such as "" and "Â".

It turns out that the iso-8859-1 page contains the use of characters
such as these:
©,’,—,“,”,®,™,…
These characters display fine on my browser from the iso-8859-1 page,
but when I use the utf8_encode function and display it on my utf-8 page,
the result is garbled.

So I have found the only solution is to manually convert all of the
characters above before using the utf8_encode function and that solves
the problem crudely, but it is not a perfect solution. What if I have
missed any characters? Isn't there a cleaner method, a PHP function,
that will do all this conversion without worry and without missing any
characters?



Reproduce code:
---------------
Here is an example of an iso-8859-1 page which displays fine on my
browser, but contains such characters such as ©,’,—,“,”,®,™,… mentioned
above:
http://www.jardenstore.com/product.aspx?bid=18&pid=1251


Expected result:
----------------
After using the utf8_encode function, I expected to see the page
displaying correctly again on my UTF-8 page with these characters
intact: ©,’,—,“,”,®,™,… 

Actual result:
--------------
Instead, the result was garbled like this:
‘,—,–,’,Â,â€â„¢,â€â„¢,â€,é,ð,™,œ,,è,Ž,Â


------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=41554&edit=1

Reply via email to