ID: 41554 Updated by: [EMAIL PROTECTED] Reported By: victorepand at gmail dot com -Status: Open +Status: Bogus Bug Type: Strings related Operating System: Linux PHP Version: 4.4.7 New Comment:
Sorry, but your problem does not imply a bug in PHP itself. For a list of more appropriate places to ask for help using PHP, please visit http://www.php.net/support.php as this bug system is not the appropriate forum for asking support questions. Due to the volume of reports we can not explain in detail here why your report is not a bug. The support channels will be able to provide an explanation for you. Thank you for your interest in PHP. Previous Comments: ------------------------------------------------------------------------ [2007-06-05 19:04:23] victorepand at gmail dot com Those characters are windows-1252 encoded because they were typed into Wordpad on a Windows operation system. So now my question is, how can my script detect the coding of a variable if it is unknown? For example, if I use this function: mb_convert_encoding($testhtml,"UTF-8","auto"), I get an "Unable to detect character encoding" error. Here is an example of that: <?php $testhtml="<html>\n<head>\n<META http-equiv=Content-Type content=\"text/html; charset=UTF-8\">\n</head>\n<body>\nSpecial Characters: ©,,,,,®,, </body>\n</html>"; $testhtml=mb_convert_encoding($testhtml,"UTF-8","auto"); print $testhtml; ?> Sample output: http://www.vacuumfoodsealer.info/utftest4.php Warning: mb_convert_encoding() [function.mb-convert-encoding]: Unable to detect character encoding in /home/vgevge/public_html/vacuumfoodsealer/utftest4.php on line 3 Special Characters: �,�,�,�,�,�,�,� ------------------------------------------------------------------------ [2007-06-05 08:14:08] [EMAIL PROTECTED] The page you linked to as "an example of an iso-8859-1 page" appears to in fact be already encoded in utf-8. Treating utf-8 bytes as iso-8859-1 and attempting the conversion will result in the "incorrect" output you're seeing. Make sure that the source you're giving utf8_encode() is in fact iso-8859-1 encoded. ------------------------------------------------------------------------ [2007-06-04 22:43:14] victorepand at gmail dot com Here are 2 short test scripts that demonstrate the problem: <?php $testhtml="<html>\n<head>\n<META http-equiv=Content-Type content=\"text/html; charset=UTF-8\">\n</head>\n<body>\nSpecial Characters: ©,,,,,®,, </body>\n</html>"; print $testhtml; ?> The sample output is shown here: http://www.vacuumfoodsealer.info/utftest2.php Special Characters: �,�,�,�,�,�,�,� The result is garbled which is correct in this case, because the content-type of the page is UTF-8 and the characters are not encoded. However, the second test script: <?php $testhtml="<html>\n<head>\n<META http-equiv=Content-Type content=\"text/html; charset=UTF-8\">\n</head>\n<body>\nSpecial Characters: ©,,,,,®,, </body>\n</html>"; print utf8_encode($testhtml); ?> Produces this output here: http://www.vacuumfoodsealer.info/utftest.php Special Characters: ©,’,—,“,”,®,™,… This time the characters have been encoded into UTF-8. Since the content-type of the page is UTF-8 and the characters have been encoded into UTF-8, then why should they appear garbled? And if it is not a bug with utf8_encode, then what method would I use to correctly display these characters in UTF-8? I don't know of any function that will convert these characters! ------------------------------------------------------------------------ [2007-06-04 17:27:33] [EMAIL PROTECTED] Thank you for this bug report. To properly diagnose the problem, we need a short but complete example script to be able to reproduce this bug ourselves. A proper reproducing script starts with <?php and ends with ?>, is max. 10-20 lines long and does not require any external resources such as databases, etc. If the script requires a database to demonstrate the issue, please make sure it creates all necessary tables, stored procedures etc. Please avoid embedding huge scripts into the report. ------------------------------------------------------------------------ [2007-06-01 01:32:55] [EMAIL PROTECTED] My gut reaction to your problem is to mention that you've probably mixed up ISO 8859-1 and Windows-1252: the two are commonly confused for each other, the Windows encoding containing several more characters: However, said behavior does not precisely match up with your predicament, as © and ® are part of ISO 8859-1. Furthermore, the URL you supplied is already encoded in UTF-8. Perhaps you are double encoding? Either way, this is not a problem with the documentation, except possibly the fact that the user notes are waaaaay to long on utf8_encode and some of the info needs to be integrated into the main docs. ------------------------------------------------------------------------ The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at http://bugs.php.net/41554 -- Edit this bug report at http://bugs.php.net/?id=41554&edit=1