Edit report at https://bugs.php.net/bug.php?id=60884&edit=1

 ID:                 60884
 Updated by:         ras...@php.net
 Reported by:        t dot nickl at exse dot de
 Summary:            htmlentities() behaves differently and thus breaks
                     existing code
 Status:             Bogus
 Type:               Bug
 Package:            *General Issues
 Operating System:   CentOS 4.4
 PHP Version:        5.4.0RC6
 Block user comment: N
 Private report:     N

 New Comment:

I know it hurts, but we really need to move away from ISO-8859-1 and towards 
UTF-8 as the default charset of the Web. We have chosen to take the hit in 5.4. 
The documentation has carried a warning about this impending change for quite a 
while urging people to specify a charset.

For PHP 5.4 compatibility Typo3 should either hardcode iso-8859-1 or they 
should 
change their calls to:

  htmlentities($a,NULL,'')

to pick up the default script-encoding charset.


Previous Comments:
------------------------------------------------------------------------
[2012-01-25 18:01:23] johan...@php.net

Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

In PHP 5.4 the default_charset php.ini option was set to utf-8. You can 
override this in php.ini or .htaccess or such.

------------------------------------------------------------------------
[2012-01-25 15:29:09] t dot nickl at exse dot de

Description:
------------
//This code must be run via web:

//This is a string from e.g. some database containing a german umlaut 'ä'. 
Note the encoding really is iso8859-1 . It's just assigned here literally to be 
concise.
$a = "Rechnungsadresse ändern";

//this output works: (An empty string activates some autodetection)
var_dump(htmlentities($a, ENT_COMPAT | ENT_HTML401, ''));

//this works too (the same output is generated):
var_dump(htmlentities($a, ENT_COMPAT | ENT_HTML401, 'ISO-8859-1'));

//this does NOT work (outputs empty string)
var_dump(htmlentities($a));

// Reason: php changed the charset htmlentities uses when you NOT give anything 
(90% of the code out there):

//determine_charset() :
///////////////////////////////////////////////////////
// php-5.2.1/ext/standard/html.c :
//    /* Guarantee default behaviour for backwards compatibility */
//    if (charset_hint == NULL)
//        return cs_8859_1;
/////////////////////////////////////////////////////
// php-5.4.0RC4/ext/standard/html.c :
//   /* Default is now UTF-8 */
//   if (charset_hint == NULL)
//        return cs_utf_8;

// This breaks the meaning of existing german code. For example, typo3 outputs 
empty string if end users used german umlauts in rich text editor in backend.

// Please change determine_charset() back to using cs_8859_1 if the third 
parameter of htmlentities() is omitted.

Test script:
---------------
See description.

Expected result:
----------------
See description.

Actual result:
--------------
See description.


------------------------------------------------------------------------



-- 
Edit this bug report at https://bugs.php.net/bug.php?id=60884&edit=1

Reply via email to