Edit report at http://bugs.php.net/bug.php?id=52712&edit=1

 ID:                 52712
 Updated by:         ahar...@php.net
 Reported by:        matias dot perrone at gmail dot com
 Summary:            html_entity_decode does not support all standard
                     entities
-Status:             Open
+Status:             Bogus
 Type:               Bug
 Package:            Strings related
 Operating System:   Windows 7
 PHP Version:        5.2.14
 Block user comment: N

 New Comment:

html_entity_decode() can only decode entities that exist in the given

character set. None of your example entities occur in ISO-8859-1,

therefore they have to be left as entities. To see this in action: if

you change the character set to ISO-8859-15, the € entity does get

correctly decoded, since ISO-8859-15 added the € character to

ISO-8859-1.



You'd be much better off using a Unicode character set like UTF-8,

since that can represent all of the characters defined by HTML

entities.



Not a bug; closing.


Previous Comments:
------------------------------------------------------------------------
[2010-08-27 06:01:15] matias dot perrone at gmail dot com

Description:
------------
The function "html_entity_decode" does not support all html entities as
documented 

in http://www.w3.org/TR/html4/sgml/entities.html



Test script:
---------------
$sEntities = '’ ‘ “ ” € ˆ';

echo "Start: ".$sEntities."\n";

$sEntities = html_entity_decode(($sEntities), ENT_QUOTES,
"ISO-8859-1");

echo "Result: ".$sEntities;

Expected result:
----------------
Start: ’ ‘ “ ” € ˆ

Result: ’ ‘ “ ” € ˆ



Actual result:
--------------
Start: ’ ‘ “ ” € ˆ

Result: ’ ‘ “ ” € ˆ


------------------------------------------------------------------------



-- 
Edit this bug report at http://bugs.php.net/bug.php?id=52712&edit=1

Reply via email to