ID:               41554
 Updated by:       [EMAIL PROTECTED]
 Reported By:      victorepand at gmail dot com
-Status:           Open
+Status:           Bogus
 Bug Type:         Strings related
 Operating System: Linux
 PHP Version:      4.4.7
 New Comment:

Sorry, but your problem does not imply a bug in PHP itself.  For a
list of more appropriate places to ask for help using PHP, please
visit http://www.php.net/support.php as this bug system is not the
appropriate forum for asking support questions.  Due to the volume
of reports we can not explain in detail here why your report is not
a bug.  The support channels will be able to provide an explanation
for you.

Thank you for your interest in PHP.




Previous Comments:
------------------------------------------------------------------------

[2007-06-05 19:04:23] victorepand at gmail dot com

Those characters are windows-1252 encoded because they were typed into
Wordpad on a Windows operation system.

So now my question is, how can my script detect the coding of a
variable if it is unknown? For example, if I use this function:
mb_convert_encoding($testhtml,"UTF-8","auto"), I get an "Unable to
detect character encoding" error. Here is an example of that:

<?php
$testhtml="<html>\n<head>\n<META http-equiv=Content-Type
content=\"text/html; charset=UTF-8\">\n</head>\n<body>\nSpecial
Characters: ©,’,—,“,”,®,™,…</body>\n</html>";
$testhtml=mb_convert_encoding($testhtml,"UTF-8","auto");
print $testhtml;
?>

Sample output:
http://www.vacuumfoodsealer.info/utftest4.php
Warning: mb_convert_encoding() [function.mb-convert-encoding]: Unable
to detect character encoding in
/home/vgevge/public_html/vacuumfoodsealer/utftest4.php on line 3
Special Characters:
&#65533;,&#65533;,&#65533;,&#65533;,&#65533;,&#65533;,&#65533;,&#65533;

------------------------------------------------------------------------

[2007-06-05 08:14:08] [EMAIL PROTECTED]

The page you linked to as "an example of an iso-8859-1 page" appears to

in fact be already encoded in utf-8. Treating utf-8 bytes as iso-8859-1

and attempting the conversion will result in the "incorrect" output 
you're seeing. Make sure that the source you're giving utf8_encode() is

in fact iso-8859-1 encoded.

------------------------------------------------------------------------

[2007-06-04 22:43:14] victorepand at gmail dot com

Here are 2 short test scripts that demonstrate the problem:

<?php
$testhtml="<html>\n<head>\n<META http-equiv=Content-Type
content=\"text/html; charset=UTF-8\">\n</head>\n<body>\nSpecial
Characters: ©,’,—,“,”,®,™,…</body>\n</html>";
print $testhtml;
?>

The sample output is shown here:
http://www.vacuumfoodsealer.info/utftest2.php
Special Characters:
&#65533;,&#65533;,&#65533;,&#65533;,&#65533;,&#65533;,&#65533;,&#65533;

The result is garbled which is correct in this case, because the
content-type of the page is UTF-8 and the characters are not encoded.

However, the second test script:
<?php
$testhtml="<html>\n<head>\n<META http-equiv=Content-Type
content=\"text/html; charset=UTF-8\">\n</head>\n<body>\nSpecial
Characters: ©,’,—,“,”,®,™,…</body>\n</html>";
print utf8_encode($testhtml);
?>

Produces this output here:
http://www.vacuumfoodsealer.info/utftest.php
Special Characters: ©,&#146;,&#151;,&#147;,&#148;,®,&#153;,&#133;

This time the characters have been encoded into UTF-8. Since the
content-type of the page is UTF-8 and the characters have been encoded
into UTF-8, then why should they appear garbled? And if it is not a bug
with utf8_encode, then what method would I use to correctly display
these characters in UTF-8? I don't know of any function that will
convert these characters!

------------------------------------------------------------------------

[2007-06-04 17:27:33] [EMAIL PROTECTED]

Thank you for this bug report. To properly diagnose the problem, we
need a short but complete example script to be able to reproduce
this bug ourselves. 

A proper reproducing script starts with <?php and ends with ?>,
is max. 10-20 lines long and does not require any external 
resources such as databases, etc. If the script requires a 
database to demonstrate the issue, please make sure it creates 
all necessary tables, stored procedures etc.

Please avoid embedding huge scripts into the report.



------------------------------------------------------------------------

[2007-06-01 01:32:55] [EMAIL PROTECTED]

My gut reaction to your problem is to mention that you've probably
mixed up ISO 8859-1 and Windows-1252: the two are commonly confused for
each other, the Windows encoding containing several more characters:
€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ

However, said behavior does not precisely match up with your
predicament, as © and ® are part of ISO 8859-1. Furthermore, the URL you
supplied is already encoded in UTF-8. Perhaps you are double encoding?

Either way, this is not a problem with the documentation, except
possibly the fact that the user notes are waaaaay to long on utf8_encode
and some of the info needs to be integrated into the main docs.

------------------------------------------------------------------------

The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
    http://bugs.php.net/41554

-- 
Edit this bug report at http://bugs.php.net/?id=41554&edit=1

Reply via email to